This article describes the challenges that users face and the solutions available to make running cloud based HPC applications a reality. You’ll learn about different cloud computing models, potential economic savings and factors to consider when comparing an on-site data center with a cloud-based provider.
A successful example of how a well-managed GPU cluster allowed scientist to focus on obtaining results comes from the Tokyo University of Agriculture and Technology (TUAT) results. A research group lead by Dr. Akinori Yamanaka develops computation models and simulates engineering materials, for a variety of applications, using HPC. Using Bright Cluster Manager, Dr. Yamanaka and his team were able to immediately focus on algorithm development and not burden the team with cluster administration issues.
For some applications, cloud based clusters may be limited due to communication and/or storage latency and speeds. With GPUs, however, these issue are not present because application running on cloud GPUs perform exactly the same as those in your local cluster — unless the application span multiple nodes and are sensitive to MPI speeds. For those GPU applications that can work well in the cloud environment, a remote cloud may be an attractive option for both production and feasibility studies.
As an open source tool designed to navigate large amounts of data, Hadoop continues to find new uses in HPC. Managing a Hadoop cluster is different than managing an HPC cluster, however. It requires mastering some new concepts, but the hardware is basically the same and many Hadoop clusters now include GPUs to facilitate deep learning.
In a perfect world, there would be one version of all compilers, libraries, and profilers. To make things even easier, hardware would never change. However, technology marches forward, and such a world does not exist. Software tool features are updated, bugs are fixed, and performance is increased. Developers need these improvements but at the same time must manage these differences.
HPC developers want to write code and create new applications. The advanced nature of HPC often requires that this process be associated with specific hardware and software environment present on a given HPC resource. Developers want to extract the maximum performance from HPC hardware and at the same time not get mired down in the complexities of software tool chains and dependencies.
When discussing GPU accelerators, the focus is often on the price-to-performance benefits to the end user. The true cost of managing and using GPUs goes far beyond the hardware price, however. Understanding and managing these costs helps provide more efficient and productive systems.
If you open the back of today’s HPC cluster you will see lots of cables. There are any number of cables including those for power, Ethernet, InfiniBand , Fibre Channel, KVM, and others. The current situation creates the need for complex configuration and administration.
Hadoop configuration and management is very different than that of HPC clusters. Develop a method to easily deploy, start, stop, and manage a Hadoop cluster to avoid costly delays and configuration headaches. Hadoop clusters have more “moving software parts” than HPC clusters; any Hadoop installation should fit into an existing cluster provisioning and monitoring environment and not require administrators to build Hadoop systems from scratch. Learn about managing a Hadoop cluster from the insideHPC article series on Successful HPC Clusters.
Make sure you use Cloud services that are designed for HPC applications including high-bandwidth, low-latency networking, exclusive node use, and high performance compute/storage capabilities for your application set. Develop a very flexible and quick Cloud provisioning scheme that mirrors your local systems as much as possible, and is integrated with the existing workload manager. An ideal solution is where your existing cluster can be seamlessly extended into the Cloud and managed/monitored in the same way as local clusters. Read more from the insideHPC Guide to Managing HPC Clusters.