Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Storage as a Service in AWS

By Shailesh Manjrekar, Head of AI and Strategic Alliances, WekaIO

The AWS Cloud is an ideal platform to support an agile compute environment for AI workloads requiring HPC methods to accelerate training and inference. A prime ingredient of this effort is a fast, scalable file system on AWS to ensure applications never have to wait for data.

In this article, we’ll take a look at “storage as a service” in using Amazon Web Services (AWS), and how this is made possible with Weka’s Limitless Data Platform. And you can plan your cluster on Amazon Elastic Compute Cloud (EC2) with Weka’s self-service tools that help you configure the most cost-efficient Amazon EC2 instance based on the storage and performance requirements of your application.

Business Problems and Solutions

When considering technology alternatives, it’s important to take a look at specific business problems, and how companies are trying to solve them with the ultimate goal of accelerating innovation and improving time-to-market. Organizations are deluged by data today. What they do with it, and how they do it, will likely be one of the key measures of their success going forward. Some big challenges include the increasing density of the compute layer, budgets are shrinking, and enterprises are struggling to get actionable intelligence out of their data assets in terms of how they operationalize that data.

Many times, solutions to these challenges suggest the need for a “cloud first strategy.” A cloud-first strategy is an organizational commitment to evaluate cloud-based solutions before considering other technology alternatives. Unlike a “cloud-only” strategy, it doesn’t entirely eliminate other technology solutions, allowing for greater flexibility.

And there are some important business drivers which are fueling a cloud-first strategy, as well as some new and disruptive trends in DevOps, ITOps, and MLOps. For example, containerization is becoming more popular, with Kubernetes becoming the cloud-native operating system. Then there is the increasing adoption of NVMe, flash and stateful applications, and GPU’s for AI initiatives. These are some of the important drivers where businesses are evaluating a cloud-first strategy.

Weka Value Proposition

Weka’s Limitless Data Platform, built on the Weka File System (WekaFS), was born in the cloud. It is an optimal storage solution for cloud and hybrid cloud strategies. Weka was developed on AWS, delivering very high performance with 100Gb/sec of throughput and under 250 sec latency. The Weka file system can be extended across multiple availability zones without any performance degradation. It offers best in the cloud performance demonstrated by a number of industry benchmarks including IO500, as well as providing a high-performance storage solution for customers using the kdb+ time series database from kx Systems.  Weka supports hybrid deployments with and without AWS Outposts. AWS Outposts is an innovation with massive opportunity in financial services, life sciences, etc. that delivers all the benefits of the public cloud  but to on-premises deployments. Weka is fully validated on AWS Outposts.

“WekaFS offers kdb+ a combination of good read performances and meta data operational latency, being one or two orders of magnitude better than Amazon Elastic File System (EFS), storage gateways and all open-source products we tested,” said Glenn Wright, Systems Architect, KX Systems

Weka is ideal for the cloud with a blending of enterprise and on-premises features which are fundamental to the cloud: unlimited elasticity to grow and shrink, utility pricing, and Kubernetes ready. And even though Weka is cloud-native, it contains all the following enterprise features:

  • Weka works with Amazon Simple Storage Service (S3) and AWS Outposts
  • Weka provides tiering
  • Weka provides data protection with snapshots and clones
  • Weka provides quotas
  • Weka supports identity management (IM)
  • Weka is cost-effective (20-30% less expensive than alternative solutions)

Weka is deployed in Amazon EC2 I3en instances and the application clients could be  C5n or other instance types. Weka uses Amazon S3 as a capacity tier behind it. Weka has a nice CloudFormation AWS portal for customers to use called start.weka.io. Here, customers can specify the configuration that they desire.

Use Case Examples

Let’s consider a number of central use case examples of Weka deployment using hybrid cloud and AWS.

  • Finance – finance industry customers using Weka in AWS for applications such as fraud analytics, risk management, improving business performance through analytics.
  • Life Sciences – accelerating drug and vaccine discovery, precision medicine.
  • Artificial Intelligence (AI) and machine learning (M) – customers using Weka for autonomous vehicles, conversational AI, healthcare, increasing scale and profitability with recommendation systems.
  • HPC – oil and gas, and manufacturing deployments along with media & entertainment for studio-in-the-cloud.

One particular adopter is Genomics England (GEL), owned by the United Kingdom’s Department of Health and Social Care that has selected WekaFS to accelerate genomics research for the 5 Million Genomes Project. Genomics England chose Weka to meet the predicted capacity scaling that will be required over the coming five years while delivering the highest performance to its DNA pipeline.

Other adopters include Tre Altamira, a leader in measuring ground and structural movement from space using satellite radar geospatial data. The organization chose the Weka data platform on AWS for its stability and Weka’s stellar support for Tre Altamira’s geospatial workflows. Also, Untold Studios (studio-as-a-service with rendering) chose Weka because of the ability to tier to S3 storage for best cost given the volume of data the company is creating.

WekaFS is a fully parallel and distributed file system that has been designed from scratch to leverage both high-performance flash technology and cost-effective disk storage, in a single global namespace. Data and metadata are both distributed across the entire storage infrastructure to ensure massively parallel access to NVMe drives. Data is seamlessly tiered from flash to disk with Weka’s internal tiering mechanism, achieving the optimum use of storage media for the best economics.

Some important proof points include: linear scale, low latency as demonstrated by the STAC ( Securities Technology Analysis Center) benchmark, and metadata operations as demonstrated by IO500 benchmark.

Cloud Native Data Management

Weka offers support for deployments which are on-premises, in the cloud, as well as converged deployments. This diversity in deployment models shows a lot of flexibility, and with Kubernetes support Weka can orchestrate across all those three different deployment modes. The following summarizes of Weka’s benefits in the cloud:

Autoscaling:  provides dynamic provisioning where you don’t have to worry about how many nodes to schedule; the storage will automatically scale as per your workload needs.

Extend file system namespace over S3:

  • Filesystem metadata resides on flash, while seamlessly extending capacity over an on-premises or public object storage
  • Performance of NAND Flash and the economics of S3
  • Data migration to another S3 target; attach and detach S3 namespaces
  • Instant space efficient snapshots to S3, eliminates need for backup software
  • Large files are chopped down into small objects to achieve parallelism ˗ can rehydrate and modify partial files (great for image and large files)
  • Stage next workload to NVMe while current workload is running

Extend to multi-cloud: Weka has the built-in ability to extend to multi-cloud which means you can have one copy in an on-prem object store, and another copy in Amazon S3. This way you take away the “friction” associated with having a single vendor provide the object store.

Snap2Obj for cloud bursting, elasticity, and DR: Weka has built-in cloud bursting and data migration. Snap-to-object (“Snap2Object”) is a unique capability which allows for cloud bursting, elasticity, and DR in AWS and on-premises. Snap2Object is a utility which is provided by Weka and is built into the file system. It can take a snapshot of the entire file namespace which can be moved to the cloud for cloud bursting elasticity, and DR.

In addition, the same snap-to-object capability can be used for multi-data center replication. Here are some additional benefits for cloud bursting:

  • EC2 cluster can be formed based on S3 snapshot data
  • The original on-prem cluster can continue running and take snapshots
  • The EC2 cluster can run concurrently and take snapshots
  • Each snapshot that is pushed to S3 can be linked back to the other system
  • The data is viewed via the namespace using the .snapshots directory and data can be merged

Security: Weka has built-in security with the capability to encrypt. Weka provides end-to-end encryption at-rest, and in-flight, and works with leading KMS (“key management systems”) solutions like HashiCorp Vault Key, providing encryption both on-prem as well as in the cloud.

Containers and Kubernetes: Weka also offers support for containers and Kubernetes. This includes a solution with Rancher Labs that provides a Kubernetes-as-a-service capability for an important enterprise Kubernetes strategy. WekaFS provides a CSI plugin which can enable stateful applications to be hosted either on-prem or in the cloud. With that, Weka is able to support deployment models like hybrid cloud or CICD (“continuous integration, continuous deployment”) pipelines.

Availability in AWS Marketplace

For convenience, Weka can be accessed in the AWS marketplace as a complete SaaS offering. Weka is available as a SaaS listing in the AWS Marketplace where end customers and channel partners can leverage the AWS Marketplace offering. In general, marketplace business is growing across all SaaS providers, and this is how Weka software-as-a-service or high-performance software as a service is consumed by end customers.

Weka can be deployed on any EC2 instance that has local SSD or NVMe storage and dramatically improve your file storage. Take advantage of the elastic compute resources available in the cloud for massive scale. Integrated and transparent tiering to S3 provides best cost and infinite scale.

Summary

In this article we touched upon a number of important benefits of AWS storage with Weka. We saw how to leverage AWS to dramatically improve application performance and scale. To get started and plan your cluster on AWS, go to https://start.weka.io/

Leave a Comment

*

Resource Links: