HPC Workloads: Intel on the Convergence of AI, Analytic, and Traditional

In this sponsored post, Intel explores how HPC is becoming ‘more than just HPC.’ Traditional HPC workloads are converging with AI and analytic projects. Enter the new world of HPC.

The cloud software ecosystem has developed so that users now have expectations of both supercomputer performance and access to the latest technologies. (Photo: Shutterstock/BeeBright)

HPC is no longer just HPC, but rather a mix of workloads that instantiate the convergence of AI, traditional HPC modeling and simulation, and HPDA (High Performance Data Analytics). Exit the traditional HPC center that just runs modeling and simulation, and enter the world that must support the convergence of HPC-AI-HPDA computing and sometimes with specialized hardware.

Intel has put significant effort into developing solutions that support pooling workload-optimized on-premises traditional batched HPC clusters – even when the clusters are running across a WAN at geographically distant locations, and even when some of the user base wants to run accelerators or in a cloud-based environment.

This article will provide an overview of the drivers behind the convergence of HPC-AI-HPDA workloads. We briefly touch on Intel solutions for traditional batch-oriented clusters, but those who wish more detailed discussion on pre-validated Intel® Select Solutions and their compatibility with selected open-source batch schedulers should read Intel Select Solutions for HPC AI Converged with Open-Source Batch Schedulers Solution Brief.

Cloud and AI have stimulated mass demand and innovation

Trish Damkroger, VP, Intel Data Center, recently said, “High-performance computing is a strategic capability to accelerate scientific discovery and industrial innovation, further driving our economic competitiveness, technology leadership, and national security.”

AI and HPDA

The huge growth in electronically analyzable data coupled with a rapidly maturing software ecosystem of industry-standard AI and data analytics tools lets users work with data in ways that have transformed the computer industry and created a new era in HPC. In reality, HPC should now be considered as HPC-AI-HPDA. AI will remain a workload in the HPC datacenter.

Cloud

Similarly, cloud computing is acting as a massive source of innovation, as it gives everyone access to a supercomputer “secret weapon” for their AI and HPC needs. Now a huge mass audience of SMBs (small and medium businesses) and small research teams have access via HPC-as-a-Service (HPCaaS) offerings to software tools that can run at supercomputer scale with supercomputer class performance.

High-performance computing is a strategic capability to accelerate scientific discovery and industrial innovation, further driving our economic competitiveness, technology leadership, and national security.” — Trish Damkroger, VP, Intel Data Center

Further, the cloud software ecosystem has developed, so that users now have expectations of both supercomputer performance and access to the latest technologies.

Dan Stanzione, executive director at TACC (Texas Advanced Computing Center), succinctly summarizes this by stating, “Giving users access to the cloud means they can experiment with the latest architectures as cloud providers are deploying those all the time.”

Meeting disparate needs with pooled workload-optimized clusters

To meet user expectations and customer requirements, Intel advocates the use of workload optimized clusters that can be pooled together into a unified cluster architecture.

The end result maximizes the value of existing resources because the resource manager, not humans, works 24/7 to keep the hardware busy.

For example, many organizations don’t run their infrastructure for deep learning networks on a 24×7 basis. The part-time nature of these workloads means that the special-purpose infrastructure often sits idle and may require rarefied skills to support, both of which can be costly to the business.

The resource manager

To support a pooled environment using existing HPC batch schedulers, Intel has created a solution for popular batch schedulers that help when submitting jobs on behalf of AI or analytics workloads so they can run efficiently. The abstraction offered by these solutions dramatically simplifies implementation for customers.

These also support cloud jobs in a batch HPC environment. Thus, simulation and modeling workloads continue to operate as usual, creating a unified environment from the standpoint of resource management. This means users can burst to the cloud to save on-premises resources, or experiment with new hardware and software in the cloud.

Managing Resources

To manage and optimize distributed applications, services, and big data frameworks, Intel recommends using Univa Grid Engine or the open-source Magpie for SLURM.

Both these solutions distribute data center resources to create a single virtual pool, running across bare metal servers, virtual machines and in the cloud.

Tying it all together into a unified data environment

To create a unified environment, Intel recommends using the open-source Alluxio storage abstraction for all pooled clusters.

Succinctly, Alluxio creates a single point of access to data so applications can transparently access data in-place without complex, time-consuming configuration requirements. Eliminating the need to move or duplicate data around the enterprise creates significant performance and efficiency gains.

Optimized servers lie at the heart of every cluster

Servers are the heart of the cluster. Unfortunately, organizations might be tempted to add AI piece-meal to their existing HPC compute nodes, resulting in servers that run a hodge-podge of incompatible hardware and software. Or, they may feel they must invest in a separate cluster. Alternatively, Intel Select Solutions support extensive workload-optimized hardware and software configurations for HPC-AI-HPDA workloads.

Learn more about how your HPC clusters can use these solutions to support all your AI and HPC workloads without the cost of a separate cluster.

Download the Intel Select Solutions for HPC AI & Converged Clusters Solution Brief.
Get Magpie from github to run TensorFlow on SLURM.
Investigate the Univa Grid Engine and Universal Resource Broker along with commercial support for workload management.

Intel Addresses the Convergence of AI, Analytic, and Traditional HPC Workloads

Sponsored Guest Articles

The Future-Proofed Datacenter: DDC Delivers 85kW Air-Cooled Density for AI and HPC Workloads

White Papers

Energy efficiency drives HPC to the cloud

Featured RSS Feed

More News from insideBIGDATA