The Long Rise of HPC in the Cloud

In this special guest feature from Scientific Computing World, Robert Roe investigates the growth in cloud technology being driven by scientific, engineering and HPC workflows through application specific hardware.

The cloud computing market has seen considerable development in the last few years, as users begin to adopt cloud technologies across many business segments. However, the success of general purpose, enterprise cloud technology has hampered the uptake of cloud in HPC because it requires substantially more expensive hardware.

As cloud providers could capitalize on this ‘low-hanging fruit’ in the enterprise, there was little to no reason for them to try and cater for more intensive computing demands of HPC users.

However, as the cloud market has matured, we have begun to see the introduction of HPC cloud providers and even the large public cloud providers such as Microsoft are introducing genuine HPC technology to the cloud. This change opens up the possibility for new users that wish to either augment their current computing capabilities or take the initial plunge and try HPC technology without investing huge sums of money on an internal HPC infrastructure.

Leo Reiter, CTO of Nimbix, shares his views on the steps that cloud providers have taken to adapt to meet the requirements of HPC but also scientific and engineering users. Reiter stated: “Cloud computing has come a long way in recent years. Early on, your only choices were object storage and cheap, lightweight compute instances. Today we see myriad new options across all classes of service and for all types of users.”

A new breed of cloud infrastructure

Increasing options and applications-specific cloud hardware are common themes among cloud providers in 2017. Many providers are moving away from this one size fits all model for HPC. Now they are deploying hardware or networking capabilities that match the needs of the user community.

An example of this is the creation of vScaler, a cloud computing company which aims to specifically address the needs of HPC users by providing cloud capabilities through both private and public clouds.

David Power, CTO at vScaler, said: “vScaler is a project that we have been developing as a cloud-based HPC technology for the last three years. It is now being spun out as its own entity, and I am heading up that organization from a technical standpoint regarding developing specific cloud products for HPC use cases.”

Power was keen to stress that this is the same physical hardware you would see in an HPC cluster, from computing and storage to networking so users can access real HPC in the cloud: ‘It’s modern technology, the same physical hardware we would use to deploy physical HPC systems we have just put the cloud software on top it to make sure we are maximizing the performance that we can get.’

That widens the potential use case for cloud as people from HPC can now leverage the cloud to run a lot of their simulations.” This is instead of the traditional method of procuring a data center which can take months and cost considerable up-front investment. The other benefit of the cloud is the flexibility it provides, when compared to the consumption model of traditional HPC.

Power explained that vScaler has already set up datacenters across the UK, Ireland, and Germany and they expect to open more over the next two years. These new cloud services, which deploy high-performance servers linked through InfiniBand networks, are opening the door to HPC users who wish to make use of the cloud.

Many people are starting to get to the limits of what you can do with a traditional, in-house cluster either because of power or staffing and managing these resources” stated Power. “I think cloud provides not an alternative but a good way of augmenting your HPC capabilities with additional resources that are available on-demand.”

However, it is not only HPC users who are taking up this offer. Power stressed that this cloud technology is also used by non-traditional HPC users who want access to high-throughput or GPU-based servers: “We are starting to see people that would not have used HPC traditionally,” said Power. “If you look at SMEs or smaller companies that are doing new product design, rapid prototyping, AI or machine learning, these guys would not have been traditional HPC users.”

HPC specific technology

In November 2015 the IDC’s research vice president of high performance computing, Steve Conway, said in an interview with Scientific Computing World that the use of cloud was increasing in HPC markets but that current public cloud architectures stunted adoption of the technology.

Conway said: “Most of the public clouds are set up to manage embarrassingly parallel workloads, and so users are smart: they will send those embarrassingly parallel workloads off to the cloud and handle the other, less parallel, workloads in-house or some other way.” Embarrassingly, parallel problems require little or no communication of results between tasks, which makes them much more suited to a cloud with limited interconnect speeds and I/O throughput.

If public clouds had architectures that were more friendly to a larger portion of HPC workloads, then inevitably a larger portion of HPC workloads would be sent to the cloud,” Conway concluded.

One of the biggest drivers for the increased use of cloud HPC is that providers are now deploying HPC-specific hardware in the cloud. While it will still take some time before we regularly see tens of thousands of cloud cores being ‘spun up’ for HPC on public clouds, it is necessary for the technology, and underlying infrastructure is in place for this technology to be successful.

Reiter, of Nimbix, said: “HPC hardware is very different, and while it can be emulated, it cannot be replaced. A simple example is the case of “tightly coupled” workflows – these require high-speed interconnect technology. Applications that scale across many machines and pass millions of messages per second amongst themselves to coordinate parallel algorithms simply cannot afford the bottlenecks of commodity networks. They need much lower latency. Otherwise the scalability of these algorithms breaks down very quickly. To solve high performance computing problems, you need high-performance compute.”

Ultimately, this HPC hardware in the cloud enables larger scale problems to be solved in less time. In cloud computing, time is money – even if unit costs are higher with HPC hardware, the total solution cost is usually lower because less compute time is required, versus utility infrastructure.”

Similarly, the technical team at vScaler has been hard at work optimizing their cloud technology to squeeze the most performance out of the technology possible. ‘We have spent quite a while optimizing our cloud for HPC’ said Power. ‘If you look at the public cloud that providers were offering a few years ago it was really for lightweight workloads. A couple of VM companies were doing web serving or content serving, but that intensive use case was not there due to the inefficiencies of some of the virtualization technologies.’

“We have spent the last couple of years optimizing the cloud environments so that we are getting very close to bare-metal performance on our systems. The penalty for simplicity, ease of use, and rapid provisioning that you get with the cloud often came with a performance penalty but we have narrowed that gap considerably.”

With Nimbix putting in hardware that reflects what people would buy as part of an HPC cluster, we have taken the same approach and put GPUs and InfiniBand networks and parallel storage, all of the usual building blocks that you would see for HPC systems,’ stated Power.

A change in mindset

The availability is driving a change of mindset in both HPC users and engineers, and scientists who can now easily access cloud technology either to augment their current computing capabilities or to allow them to adopt HPC technology for new and emerging applications.

Power stressed that there is a ‘convergence between necessity and the fact that the technology has been around for a few years and there have been a lot of success stories about companies and organizations running cloud-scale architectures and the benefits of it are starting to become fairly well understood.’

We are not at the stage yet where everything is going to be in the cloud but what I do believe is that we have certain use cases that work very well within a cloud environment.”

As Power describes, the cloud may not be right for everyone, and it is not designed to be, but there are a growing number of scenarios where the cloud can add value and save considerable money for an organization with growing computing requirements.

In the opinion of Power, there will be a mixed approach to using these HPC-based cloud services: ‘Traditional users, HPC centres will always have their own internal cluster resources, and I do not think that will change any time soon,’ he stated. ‘They may want to augment some of their research capabilities with external devices. This could be through cloud or collaboration with other research facilities to leverage those resources.’

You will have a cloud-like infrastructure for research and HPC, and then you will have users that want to dip in and out of using cloud technology’ concluded Power.
One example of using the cloud as a tool to facilitate varied research infrastructure is the emedLab facility. The eMedLab data centre was setup as a co-located data centre that supports a consortium of seven UK universities and research institutes including University College London (UCL), Francis Crick Institute, Kings College London, London School of Hygiene and Tropical Medicine, Queen Mary University of London, Welcome Trust Sanger Institute, and the EMBL European Bioinformatics Institute.

Jacky Pallas, director of research platforms, University College London, stated: “eMedLab was originally funded through the MRC (Medical Research Council) in the UK. One aspect of the £8.9 million grant was to establish a common shared infrastructure where we could hold very large data sets that would be used by multiple users, and then an elastic compute resource where people could bring their own data and use the central depository data to do their analyses.”

Pallas explained that the varied user requirements meant that the facility needed to support many different types of workflow, but also support high-speed networking to move large data sets so they can be processed for research. “The user requirements were very varied from people wanting to do sequence and genome analysis to medical imaging analysis and clinical record and phenotype data, so we decided to go for the cloud environment, that would be flexible enough to meet the diverse needs of the research community that we were supporting.”

Pallas said that this computing facility is more focused on high-performance data analytics (HPDA) rather than true HPC. However, the networking requirements are similar, as the users require a fast interconnect and a focus on memory bandwidth across the entire compute infrastructure.

The eMedLab data centre is based on OpenStack, a free and open-source software platform for cloud computing. Pallas explained that the decision to use OpenStack for the eMedLAb data centre was driven by prior use of cloud technology.

“Prior to establishing eMedLab we had a small cluster that was held at Barts (St Bartholomew’s Hospital) hospital to look at cardiovascular data” said Pallas. “This was based on VMware, which was a technology that we already had in-house at UCL as part of the FARR Institute for Health Informatics.”

By the time we started looking at the design for eMedLab, OpenStack was more mature than it had been so we felt it was less of a risk to adopt this open-source cloud technology. The other reason is the cost effectiveness of the options the integrators presented us.”

Using cloud technology like OpenStack on what would be considered a supercomputer can provide huge flexibility in how that system is used. The cloud infrastructure allows users to quickly provision compute resources – up to 6,000 cores in some cases – to carry out research projects.

This flexibility helps to maximize utilization of the system, but it also makes provisioning resources much easier, particularly for bioinformatics domain experts who may have little experience in traditional HPC.

While no one technology can solve the problems faced by every engineer and scientist, increasingly the cloud offers the flexibility to suit varied workloads. Another benefit to the cloud, as the eMedLab example demonstrates, is that it facilitates the sharing of data between multiple organizations and can be used to co-locate facilitates and data, further reducing costs as multiple copies of large data sets are unnecessary.

In the opinion of UCL’s Jack Pallas, the cloud is a particularly powerful tool for scientific research.”Now we are in an era of big data and team science, where not one university or research group can solve all the complex challenges we are faced with. Cloud offers that flexibility to address multiple questions within a research project.”

This story appears here as part of a cross-publishing agreement with Scientific Computing World.

Sign up for our insideHPC Newsletter