Sign up for our newsletter and get the latest big data news and analysis.
Daily
Weekly

Habana Labs Announces Turnkey AI Training Solution Featuring Supermicro Server and DDN Storage

TEL-AVIV, ISRAEL and SANTA CLARA, Calif.–November 16, 2021– Habana Labs, an Intel Company and developer of AI processors, today announced the availability of a turnkey, enterprise-class AI training solution featuring the Supermicro X12 Gaudi AI Training Server with the DDN AI400X2 Storage system.

This system is the product of the collaboration of Habana Labs and Supermicro with DDN, a leader in AI data management and storage. With eight Habana Gaudi purpose-built AI processors, the Supermicro X12 Gaudi AI Server provides customers with highly cost-efficient AI training, ease of use and system scalability. Integration of the Gaudi platform with the DDN AI400X2 appliance eliminates storage bottlenecks found in traditional NAS storage and optimizes utilization of AI compute capacity.

As data sets become larger and AI models grow in complexity, demand for AI training capacity is increasing dramatically. According to IDC’s Semiannual Artificial Intelligence Tracker published in January 2021, over half of respondents who are AI/ML customers report rebuilding their AI models weekly or more often, and over a quarter rebuild models daily and even hourly. Along with this demand, 56 percent of AI/ML customers report that cost is the most significant challenge to implementing AI/ML solutions. Habana Gaudi was designed from inception to address this need with more cost-efficient price performance. With the fully integrated solution of the Supermicro X12 Gaudi AI server optimized with the DDN AI400X2 storage appliance, customers requiring enterprise-class, cost-effective AI training systems with enhanced data management and storage can train more and spend less.

“The Habana team is committed to bringing Gaudi’s price performance, usability and scalability to enterprise AI customers who need more cost-effective AI training solutions,” said Eitan Medina, chief business officer of Habana Labs. “We are pleased to support our customers with this new turnkey solution that brings the efficiency of the Supermicro X12 Gaudi AI Server together with the data management and storage performance of the DDN AI400X2 system to augment utilization of AI compute capacity and enable us to address this growing need in training deep learning models.”

The turnkey AI training solution comes pre-configured with one, two and four X12 server options to address AI training capacity requirements. The scalable architectures of the SMC X12 Gaudi Server and DDN AI400X2 appliance make it easy to expand to larger clusters, thereby enabling customers to scale their AI training infrastructure as their capacity requirements increase. Each Gaudi processor integrates ten 100 Gigabit Ethernet ports of RDMA over Converged Ethernet (RoCE) to provide easy and massive scaling capacity based on industry standard networking fabrics.

The solution is validated with the Habana SynapseAI® Software Platform and workloads running with Habana’s optimized TensorFlow and PyTorch Docker container images from the Habana Software Vault. The Habana Developer Site and reference models on Habana GitHub repositories make it easy for data scientists and developers to get started with building new models or migrating existing models for Gaudi. The solution is delivered and supported globally by DDN and Supermicro via partners worldwide for quick and easy deployment.

DDN ‘s AI400X2 appliance is a fully-integrated and optimized platform that brings simplicity and cost-effective data management to AI workloads at any scale. Deployed as either all-flash NVMe or hybrid NVMe and disk systems, customers can choose how to best scale performance and capacity.  Individual systems can be deployed with up to 720TB of NVMe flash and 6.4PB of hard disk storage and deliver greater than 90GB/s throughput and 3M IOPs. Through automation and powerful data management features, even the most complex AI workflows can be streamlined with a single storage solution. By reducing the number of systems and data center footprint required to deliver storage performance and capacity, the reduction in power, cooling and administrative overhead can be significant.

The Supermicro X12 Gaudi AI Training Server features eight Gaudi HL-205 mezzanine cards, dual 3rd Gen Intel® Xeon® Scalable processors, two PCIe Gen 4 switches, four hot swappable NVMe/SATA drives, fully redundant power supplies, and 24 x 100GbE RDMA (6 QSFP-DDs), resulting in near-linear system scale-out. The system contains up to 8TB of DDR4-3200MHz memory, unlocking the Gaudi AI processors’ full potential. The HL-205 is OCP-OAM (Open Compute Project Accelerator Module) specification compliant. Each Gaudi incorporates 32GB HBM2 on-chip memory.

Leave a Comment

*

Resource Links: