Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Fear Not the Data Tsunami—Object Storage is Here

by Michael St-Jean, Red Hat

We’ve been hearing it for years—the looming concern that data is growing exponentially, fueled by the Internet of Things, data governance, 5G networks, and the like. But if some data is good, more must be better. Right…?

Only if you’re prepared to scale and unleash the promise of object data services.

How do I store all this stuff?

It’s a curious thing, how we categorize and label items so we can find them quickly. Go to your attic, for example. You might have boxes labeled “toys,” “photo albums,” “sporting equipment.” In that last one you might find golf balls, hockey pucks, volleyballs, rappelling gear. In data terms, this is a typical “file/folder” approach, one you’ll find on your computer. We create files for things like documents, photos, and presentations and give those files metadata tags (names) to help us remember what those files are. And then we organize those files in folders (boxes).

There’s got to be a better way…

At some point, someone got creative when associating bits of information… like your friends and family with their phone numbers… or for businesses, your customers with their various sites and contacts. Then they looked at all the products they carry, the inventory, price list, etc., and decided to cross-reference this with their customers. And thus the relational database was born.

This paradigm is manageable with a few variables, but what if you’re a sporting-goods manufacturer and you’re selling golf balls and hockey pucks… yes, even rappelling gear,  to retailers across the country—or across the globe? Your database file is going to get really big—really fast.

Block storage to the rescue!

No problem. Other crafty rascals decided to break their databases into manageable parts, or blocks, and then write indexes to help the database software figure out where to look for something. In essence, there was a map to the desired data, even when a search was massive. So if you wanted “sales of basketballs in the US Northwest,” for example, you wouldn’t have to sift through rappelling gear sold in the Ozarks. Thank you, block storage!

So what’s “object data”?

Now. Imagine you’re on a golf course and someone balances a hockey puck on your tee. You don’t need to be told you have a hockey puck instead of the golf ball you need. Why? Because the metadata—the information about the object—is embedded in the object.

While it might take some time for you or me to find a needle (or golf ball) in a haystack, your computer has a much easier job—if it knows what it’s looking for. Modern applications can be written to find objects based on the metadata embedded within them, making object storage far more scalable and economically feasible than file or block storage.

Why is object data so important to HPC workloads?

Typically, object storage has been thought of for backups and archives, loads of old data just sitting around, waiting for the day someone needs to retrieve it. How is that relevant to high-performance computing (HPC)…?

Today’s object stores are much more performant, due in part to system architecture changes made possible by advancements in GPU technology and increased network bandwidths. Coupled with code enhancements in storage software, and the increase in capacity and endurance for NVMe devices, modern object data stores deliver unprecedented performance while maintaining incredible scalability. That’s great news for today’s analytics and AI/ML workloads, because they need access to huge amounts of data. The flat organization of objects—versus a hierarchical architecture—makes it easier for data engineers to amass data stores in situ, which they can render accessible to data scientists without having to copy to file or block protocols.

The sky’s the limit with object storage

As a recent analyst study shows, object data is not only highly scalable, it can deliver deterministic performance at massive scale—more than 10 billion objects. That’s 10 with 9 zeros! At the rate data is amassing today, that’s the scale we’ll all need. And there’s more good news. Advancements in data pipeline automation, which use object bucket notifications, event streaming, and serverless functions, transform the object data stores of the past into vibrant, real-time automated pipelines driving living insights.

So fear not the data tsunami we’ve long been approaching. Today’s new tools and architectures will help you tame the ebb and flow of object data and weather the storm.

 

Leave a Comment

*

Resource Links: