HPE Launches 2 HPC-AI Offerings for ML Implementation and and Collaboration

Print Friendly, PDF & Email

Two announcements today from HPE underscore the strategic imperative of combining HPC and AI — along with demand for systems that ease AI implementation complexity.

HPE announced its Machine Learning Development System, designed to accelerate AI training models at scale. HPE said the system delivers value in days, rather than the typical weeks or months, by combining its HPC resources with the open source ML platform of Determined AI, a startup HPE acquired last June, developer of an AI training software stack.

In addition, the company announced HPE Swarm Learning, an edge AI offering the company said fosters collaboration by allowing organizations to share AI model learnings, rather than data, with other organizations. The goal: improve model accuracy and reduce AI biases while ensuring data privacy and governance.

HPE said its Machine Learning Development System is an end-to-end solution that integrates machine learning software, compute, accelerators and networking to develop and train more accurate AI models at scale.

The system is built on the HPE’s Apollo 6500 Gen10 server with eight NVIDIA A100 80Gb GPUs. It also utilizes HPE Performance Cluster Management software and NVIDIA Quantum InfiniBand. HPE ProLiant DL325 servers and 1 Gb Ethernet Aruba CX 6300 switch handle stack and system component management. The system is offered in configurations from 32 GPUs to 256 GPUs.

HPE said the small configuration delivers approximately 90 percent workload scaling efficiency. Based on internal testing, the 32 GPU-system delivers up to 5.7X faster throughout for a natural language processing workload compared to “another offering containing 32 identical GPUs, but with a sub-optimal interconnect,” HPE said.

Aleph Alpha GmbH, a German AI startup, said it has adopted the system to train their multimodal AI, which includes NLP covering five languages and computer vision for complex texts, higher level understanding summaries and specific information search across hundreds of documents in a conversational context.

“We are seeing astonishing efficiency and performance of more than 150 teraflops by using the HPE Machine Learning Development System,” said Jonas Andrulis, founder-CEO of Aleph Alpha. “The system was quickly set up and we began training our models in hours instead of weeks. While running these massive workloads, combined with our ongoing research, being able to rely on an integrated solution for deployment and monitoring makes all the difference.”

The system is available now worldwide, HPE said.

HPE said Swarm Learning was developed by Hewlett Packard Labs and described it as “the industry’s first privacy-preserving, decentralized ML framework for the edge or distributed sites.”

Using HPE’s swarm API, it provides customers with containers that are integrated with AI models, enabling users to share AI model learnings within their organization and outside with industry peers with the intent of improve training, without sharing actual data.

HPE cited the University of Aachen in Germany, which studies histopathology to accelerate diagnosis of colon cancer. A team of cancer researchers at University Hospital of RWTH University Aachen conducted a study applying AI on image processing to predict genetic alterations, which can cause cells to become cancerous. They trained AI models using HPE Swarm Learning on three groups of patients from Ireland, Germany and the U.S. and validated the prediction performance in two independent datasets from the U.K. using the same, swarm learning-based AI models.

The university said the results demonstrated that the original AI models, training only on local data, were outperformed using swarm learning due to sharing learnings, but not the patient data, with other entities to improve predictions.

In addition, HPE said TigerGraph, provider of a graph analytics platform, combined Swarm Learning with its data analytics offering running on HPE ProLiant servers using AMD EPYC processors. They used the system to detect unusual activity in credit card transactions. TigerGraph said the solution increases accuracy when training machine learning models from large quantities of financial data from multiple banks and branches.

“Swarm learning is a new, powerful approach to AI that has already made progress in addressing global challenges such as advancing patient healthcare and improving anomaly detection that aid efforts in fraud detection and predictive maintenance,” said HPE’s Justin Hotard, EVP/GM, HPC & AI. “HPE is contributing to the swarm learning movement in a meaningful way by delivering an enterprise-class solution that uniquely enables organizations to collaborate, innovate, and accelerate the power of AI models, while preserving each organization’s ethics, data privacy, and governance standards.”