insideHPC Guide to QCT Platform-on-Demand Designed for Converged Workloads – Part 2

Print Friendly, PDF & Email

Not too long ago, building a converged HPC/AI environment – with two domains: High Performance Computing (HPC) and Artificial Intelligence (AI) – would require spending a lot of money on proprietary systems and software with the hope that it would scale as business demands changed.

In this insideHPC technology guide, “insideHPC Guide to QCT Platform-on-Demand Designed for Converged Workloads,”as we’ll see, by relying on open source software and the latest high performance/low cost system architectures, it is possible to build scalable hybrid on-premises solutions that satisfy the needs of converged HPC/AI workloads while being robust and easily manageable.

The converged Platform-on-Demand solution from QCT

Building a cluster to satisfy the requirements of both HPC and AI has some significant challenges:

  • Introducing an unfamiliar and complex system environment to new users and application developers  looking to achieve the highest possible performance.
  • Providing system administrators the tools they need to efficiently configure, monitor, and analyze the  health and performance of a large cluster.
  • Providing additional tools for comprehensive account management, allocation, and control of compute,  storage, and networking resources.

Taking these challenges into consideration, QCT has designed a unique Platform-on-Demand, or POD—an  on-premises, rack-level system that offers best practice hardware and software integration for both HPC and  AI workloads. Leveraging QCT’s own system administration tools, the QCT POD comes pre-configured and  pre-validated, ensuring rapid deployment and easy resource management.

QCT POD is constructed out of a set of common building blocks to ensure a high degree of flexibility and  scalability—Management Building Block, Compute Building Block, and Storage Building Block —each  connected by a network fabric that can be customized to fit user workload demands.

  • The Management Building Block, based on RedHat® Enterprise Linux® or CentOS, is a software stack that  offers a wide selection of web-based administrative and monitoring tools and dashboards that promote  efficient cluster management.
  • The Compute Building Block delivers just the right hardware and software combinations to fulfill various  workloads, such as HPC, ML, data analytics, cloud service, and edge computing. Tailored for each domain and industry with a comprehensive hardware portofolio, QCT delivers the best configuration to solve their unique challenges and achieve better performance. Kubernetes and Docker form the base for orchestration in the ML  building block, and is extendable to other frameworks, such as TensorFlow, Keras, and PyTorch. QCT’s  flexible POD design meets many diverse demands and solves most customer challenges.
  • The Storage Build Block, designed with the specific HPC & AI requirements for high IOPS and low latency in  mind, is now available to customers from all industries with similar demands, such as finance, engineering, life sciences, energy, among others. Today, with AI and Machine Learning (ML) becoming standard practice,  these workloads are having a huge impact on storage requirements, especially for large file and block storage.

Over the next few weeks we’ll explore QCT’s Platform-on-Demand designed for converged workloads:

Download the complete insideHPC Guide to QCT Platform-on-Demand Designed for Converged Workloads courtesy of QCT.