This article is the third in an editorial series that explores the benefits the HPC community can achieve by adopting HPC virtualization and secure private cloud technologies.
Virtualization has been proven to be a viable architectural approach that addresses the many challenges mentioned in last week’s article.
By creating a virtualized infrastructure, the IT organization ensures that:
• Departments, principal investigators and other key stakeholders receive the HPC resources they need when they need them
• Clusters and cluster nodes can be sized to meet specific application requirements
• Different operating systems and software stacks can be hosted simultaneously on the same infrastructure and adjusted dynamically
• IT can make efficient use of the underlying host hardware even though individual user jobs may only require a small number of CPUs
• Hardware can be shared while providing fault and security separation between users
• Policies can be enacted that allow high priority jobs to receive a higher “fairshare” of the underlying resources
For HPC environments, wrapping the virtualization infrastructure in a secure private cloud provides the most value to both the end users and the IT organization.
This approach enables self-provisioning, allowing researchers and engineers to instantiate the resources they need for a particular project without waiting for IT to create the resource for them. To instantiate a virtual HPC cluster, the user applies a previously defined blueprint that specifies the required virtual machine (VM) attributes, the number of machines involved, and the needed software – including the operating system and middleware. Users can fully customize the VMs to meet their specifications. The blueprint also allows the centralized IT organization to enforce corporate IT mandates – for example, security and data protection policies.
This solution assumes that end users will continue to run their familiar HPC batch schedulers within their virtual HPC clusters. At the same time, the VMware Distributed Resource Scheduler (DRS) and other components are able to dynamically manage the placement and priority of virtual machines on the underlying physical resources.
What the end user sees is an HPC cluster that looks just like a standard bare metal cluster running a standard job scheduler – there is no indication that they are interacting with virtual machines. This allows multiple engineering or research clusters to be instantiated on the same physical infrastructure – all available through a private cloud.
Underneath it all, virtualization is handling load balancing, protection, network services and all the other fundamentals that allow for multi-tenancy on the physical hardware while still delivering high performance. Cloud automation provides policy-based governance and logical application modeling to make sure that multi-vendor, multi-cloud services are delivered at the right size and service level for the task that needs to be performed.
Virtualization and cloud automation are fundamental attributes of a software defined data center, which allows IT to create private clouds that deliver agility and economies of scale while maintaining data sovereignty and governance.
Protecting Applications
Virtualization allows the adoption of advanced resiliency practices such as using telemetry from the underlying system to predict impending hardware failures and then proactively migrating the workload to another host to avoid application interruption. For example, the system would detect a potential fan failure or an increase in the rate of soft memory errors and take action to make sure the workload continues despite incipient system problems.
This approach should also reduce the need for frequent checkpointing and restoration, resulting in increased overall job throughput.
Next week’s article will look at Virtualization and Workload Agility. If you prefer the complete insideHPC Guide to Virtualization, the Cloud and HPC is available for download in PDF from the insideHPC White Paper Library, courtesy of VMware.