In this guest post, Trish Damkroger, Vice President and General Manager of the Technical Computing Initiative in Intel’s Data Center Group, covers how to scale your HPC environment for AI workloads.
High performance computing (HPC) has a long history of solving complex problems. Today’s advancements in artificial intelligence (AI), combined with HPC, make it possible to address unique and challenging workloads faster than ever before. Because AI workloads use instruction sets provided by algorithms and theories, it can also ‘learn’ to derive deeper insights from data sets compared to rule-based analytics and data processing applications. For these reasons, academic researchers and government agencies increasingly embrace the robust combination of HPC and AI.
Bringing AI-based workflows into an HPC environment is no easy feat. To help kick-start the planning process, we offer five important considerations below. For more detailed information, you can also read our eGuide focused on bringing AI into HPC environments.
Think holistically about your HPC needs and solution. To provide your stakeholders the ideal AI-enabled HPC environment, your software, hardware, and human skills all weigh into the equation. Academic and government environments depend on HPC systems that support multiple users with unique workloads, so system flexibility is key.
Software selection. By first choosing the necessary software for intended workflows, you can more easily plan for and optimize physical HPC infrastructure to support it. HPC systems enabling research through AI, visualization, simulation, and modeling workflows benefit from software offered by Intel, the open source community, and independent software vendors (ISVs).
HPC applications and development environment. If available applications cannot address your unique HPC usage scenarios, developers must create or modify existing software. While the HPC community offers libraries to assist in this endeavor, developers coding applications for HPC and AI may require specialized skills like optimization for parallel computing. Intel’s HPC interoperable framework assists developers with tools to modernize applications for advanced workloads and support for development languages like Python, C++, and Fortran. For more technical information about languages and frameworks, please check out the eGuide.
Academic researchers and government agencies increasingly embrace the robust combination of HPC and AI.
Physical infrastructure. Wherever possible, make the best use of your existing HPC infrastructure. By evaluating system elements like processors, storage, fabric, and memory against your users’ software requirements, you can more effectively identify potential bottlenecks. If current hardware impedes performance, upgrades may be needed. Planning and budgeting for updated system infrastructure will maximize return on investment (ROI) and avoid over provisioning.
Validate your HPC technology first. Organizations lacking in-house HPC experts should consider support from Intel, a consultant, or an original equipment manufacturer (OEM) to accelerate system upgrades and deployment. Before a full-scale rollout, validating a test system for performance—and the value of data insights provided by applications—proves beneficial. Once the validation process demonstrates delivery of the needed outcomes, prepare for deployment at scale plus ongoing administration and maintenance.
To find out how Intel’s HPC technologies can ready your organization for AI, talk to your preferred system provider, or learn more at intel.com/hpc. Please also see the links below for helpful information:
eGuide: Bringing AI Into Your Existing HPC Environment, and Scaling It Up
Intel® Xeon® Scalable Processor for HPC
Accelerating AI with Intel Omni-Path Architecture
Trish Damkroger is Vice President and General Manager of the Technical Computing Initiative in Intel’s Data Center Group.