HPC-Scale Data Management for the Enterprise That’s Easier and More Cost-Efficient than NAS

Print Friendly, PDF & Email

Co-authored by Shailesh Manjrekar, Head of AI and Strategic Alliances, WekaIO

Organizations with truly big data have truly big problems. What if you had petabytes of data to manage? You need to consider where to store the data, and also how to access it efficiently. Chances are you intend to use the data with AI for training and inference, so you also need to consider how best to offer the data to machine learning algorithms. It’s a lot to think about. Fortunately, WekaIO™ (Weka) has your back with its leading advanced file system WekaFS™, built for handling today’s biggest problems.

Weka is designed to enable organizations to maximize the full value of their high-powered IT investments – compute, networking and storage. The latest evolution of Weka technology includes new features designed to deliver ease-of-management and performance, at any scale, with a unified global namespace from flash to object storage to the cloud. The result is an HPC-scale data management solution for the enterprise that’s easier and more cost-efficient than a traditional network design like Network Attached Storage (NAS), and protocol like Network File System (NFS).

In this article, we’ll discuss a modern storage system that simplifies data lifecycle management, i.e. data management throughout the entire pipeline or data lifecycle on any tier. Specifically, we’ll review how it’s possible to extend the file namespace over the “hot tier” or “performance tier,” and the “object tier” or the “cold tier” meaning that a company or even just one person can manage more petabytes of data with less resources. “One person can manage petabytes of data,” what a concept, what a reality!

Unified, Global Namespace

Weka’s effort to deliver a unified, global namespace across public and private clouds is a significant advance toward simplifying data center management for enterprises working to manage petabyte-scale data stores. Weka took a different path from other vendors providing parallel file systems by architecting their new solution for the cloud-native era with namespace extensibility across a range of performance tiers.

WekaFS provides the ease of managing petabytes of data in a single, unified names wherever in the pipeline the data is stored, while also delivering the best performance to accelerate AI/ML and HPC workloads.

Extending the WekaFS namespace from high performance flash to an S3 REST-enabled cloud object storage systems, is a simpler and more cost-efficient strategy for managing petascale data sets without compromising performance. The filesystem metadata resides on flash while seamlessly extending capacity over private and public object storage. WekaFS allows data portability across multiple consumption models supporting both private and public clouds with the ability to extend the namespace across both. All the I/O requests are serviced by the flash tier while leveraging the object tier for capacity scaling. A cloud-first model delivers the best storage efficiency and TCO across consumption models and across data tiers.

WekaFS deployment model

Deployment Modes

Essentially, WekaFS is a purely Software Defined solution. The deployment model is a typical client server architecture, with compute and storage. Within compute, Weka works with GPU ecosystem partners, i.e. server vendors with GPUs in them, and also with partners using CPUs. With GPUs coupled with NVMe flash, you’re able to run in what is called a “converged mode” which means running as a container within the GPU servers. Then a high-performance tier is created, leveraging the NVMe flash within the GPU servers. This means that another external, dedicated storage tier is unnecessary. This is a very innovative deployment strategy where you can use a “capacity tier” which is an object store.

One of the differentiators for the WekaFS stack is that it was designed to have S3 as the back-end. It was designed from the ground up for the S3 API. That means you can use any object tier as a capacity tier, and it can be both “private clouds,” and also “public clouds.” For this purpose Weka has partnerships with object storage vendors like Scality, Cloudian, IBM, Seagate, Red Hat, etc. to ensure a fully validated and performant storage ecosystem.

Let’s summarize the three available deployment modes:

  1. The first deployment mode is where you can use high performance flash within the GPU server and then tier to an object store.
  2. The second mode of deployment is where you have the compute layer separated, and then go on a multitude of storage servers to form a cluster. This is the typical deployment model, and because WekaFS is software defined it is able to go on anybody’s storage service. Weka works with a number of OEM partners like HP, Hitachi, Cisco, Dell, etc. Weka works with all the server OEMs, and turns their storage servers into high performance storage boxes.
  3. The third way WekaFS gets deployed is in the public cloud itself. WekaFS is available in the AWS marketplace for a purely public cloud installation. WekaFS also works with AWS Outposts which allows you to run AWS infrastructure and services on premises for a consistent hybrid experience.

Ease of Management

As stated above, WekaFS offers significant ease of management in providing a single global namespace to an application or to an administrator. Specifically, the way WekaFS works with object stores is very unique. People tend to call this “tiering,” but this is not really tiering at all. To clarify, all the metadata is stored on the flash tier. All the data gets pushed to the object store. What this means is WekaFS provides the application or the user a global namespace.

In the figure above, the large gray box is a single namespace. These are not two namespaces where one tier which is a file namespace, and another tier is an S3 namespace, rather this is combined. As an example, say you have one petabyte of flash tier, and you have five petabytes of object storage, so to an application or to an end user it will look like six petabytes of file namespace. This makes it easy to manage this whole namespace. Here, you’re just managing one file system, you don’t need different administrators, you don’t need different ways to push the data to the object store or when you want to bring that data from the object store, all of that is managed by WekaFS.

WekaFS has a lot of intelligence which is built into how data is moved around, including a high degree of heuristics, and policy-based management. This is what is meant by “ease of use capability.”

There is also a level of ease of management over traditional NFS. NFS was designed 20 years ago, and it was not meant to leverage today’s new architectures and transports. You can see the transports in the figure above are going to 100-200GB networking. We work with Ethernet and InfiniBand. The ease of management comes from not using NFS. NFS cannot take advantage of high-performance networks which are needed to move away from the data locality concept. That means you can still have a shared storage and not have the latency issues associated with moving the data around. NFS becomes an inhibitor when it comes to high performance networking.

WekaFS provides a userspace agent. In the figure above this is the “W” seen in the client, a userspace agent that is a fully POSIX compliant endpoint. This brings in another level of ease of management, where you don’t have to manage volumes like you have to do with flash arrays, and you don’t have to manage large directory structures. You’re getting the performance of the new networks, whether you’re using InfiniBand, or you’re using Ethernet, 100GB/s, or 200GB/s WekaFS gets this performance because it uses an intelligent agent that knows where to transfer the data out of this storage cluster, so it’s not going to talk to one particular node, it knows exactly where it wants to go in order to read and write the data. That is what’s meant by “ease of management.”

Now, a single administrator, rather than managing a large cluster with NFS, need only manage a single endpoint. In addition, WekaFS now has another mode of operation called “GPU direct storage” which is the next generation of IO technology that was pioneered by NVIDIA, and it can provide 60x or 100x better performance over NFS which is game changing. This kind of performance is required when working with GPU servers, high performance servers running high performance applications, next generation applications such as machine learning, deep learning, conversational AI, simulation, etc.

Reference Architectures

In order to streamline enterprise adoption, Weka has developed reference architectures in collaboration with a number of leading object storage technology providers (WekaFS certified) such as Amazon Web Services (AWS), Cloudian, Hitachi Vantara, IBM, Seagate, Quantum, Scality, and others. The reference architectures are designed to uniquely deliver cost-efficient, cloud-native data storage solutions at any scale.

Now that Weka is certified with a wide-ranging ecosystem of technology partners to deliver a unified storage solution, customers benefit from the best of both worlds, specifically leveraging NVMe flash as a high performance hot tier joined seamlessly with the capacity and economics of an object store archive tier. Combined with additional robustness and replication benefits, Weka provides a single solution that securely stores and accelerates data through the entire pipeline.

Customer benefits include: (i) faster actionable BI from a single high performance storage solution; (ii) cost-efficiency with the ability to manage, scale and share project data, (iii) operational agility across edge, core, and cloud eliminating storage silos; and (iv) enterprise robustness and secure data governance.

Use Case: Genomics England

As data has become a strategic asset for businesses, ease of namespace management is paramount. The data sets encountered in AI/ML, genomics, HPC, and HPDA have grown so big and agile that many organizations are seeking a new generation of solution

As one good use case example, consider Genomics England (GEL) whose challenge was to overcome poor performance at large capacity scale. GEL had already acquired 21 Petabytes of genome data and they projected growth to over 140 Petabytes by 2023. The research conducted requires access to the entire data set and much allow researchers to query the data in a highly randomized fashion. Therefore, all data must be stored in a single storage system.

GEL’s scale-out NAS solution had already hit its limit on storage node scaling, and performance suffered when the system was near capacity. Weka came to the rescue as the only vendor that could deliver a solution that met all the requirements in a single architecture.

“We needed something that’s much more scalable than existing NAS solutions – an infrastructure that could grow to hundreds of Petabytes. Our existing solution couldn’t provide that scale and wasn’t performing as well in these magnitudes – that’s what drove us to WEKA,” said David Ardley, Director of Infrastructure Transformation, Genomics England

Conclusion

WekaFS represents a modern parallel file system that is used by large enterprise organizations to uniquely solve the newest, biggest problems holding back innovation and discovery. Purpose-built to unlock the full capabilities of today’s accelerated and agile data center, WekaFS is optimized for NVMe flash and the hybrid cloud. Its modern architecture handles the most demanding storage challenges in the most data-intensive technical computing environments, delivering truly epic performance at any scale, enabling organizations to maximize the full value of their high-powered accelerators—GPUs and FPGAs. Weka helps industry leaders solve big IT infrastructure problems and extract more value from their data faster.