Choice Comes to HPC: A Year in Processor Development

Print Friendly, PDF & Email

In this special guest feature, Robert Roe from Scientific Computing World writes that a whole new set of processor choices could shake up high performance computing.

With new and old companies releasing processors for the HPC market, there are now several options for high-performance server-based CPUs. This is being compounded by setbacks and delays at Intel opening up competition for the HPC CPU market.

AMD has begun to find success on its EPYC brand of server CPUs. While market penetration will take some time, the company is starting to deliver competitive performance figures.

IBM supplied the CPUs for the Summit system, which currently holds the top spot on the latest list of the Top500, a biannual list of the most powerful supercomputers. While a single deployment is not a particularly strong measure of success, the Summit system has generated a lot of interest, five of the six Gordon Bell Prize finalists are running their applications on this system, which highlights the potential for this CPU – particularly when it is coupled with Nvidia GPUs.

Arm is also gathering pace, as its technology partner’s ramp up production of Arm-based CPU systems for use in HPC deployments. Cavium (now Marvell) was an early leader in this market, delivering the ThunderX processor in 2015 and its follow up ThunderX2 was released for general availability in 2015.

There are a number of smaller test systems using the Cavium chips, but the largest is the Astra supercomputer being developed at Sandia National Laboratories by HPE. This system is expected to deliver 2.3 Pflops of peak performance from 5,184 Thunder X2 CPUs.

HPE, Bull and Penguin Computing have added the ThunderX2 CPU to its line-up of products available to HPC users. Coupled with the use of Allinea software tools, this is helping to give the impression of a viable ecosystem for HPC users.

With many chip companies failing or struggling to generate a foothold in the HPC market over the last 10 to 20 years, it is important to provide a sustainable technology with a viable ecosystem for both hardware and software development. Once this has been achieved, Arm can begin to drive the market share.

Fujitsu is another high-profile name committed to the development of Arm HPC technology. The company has been developing its own Arm-based processor for the Japanese Post K computer, in partnership with Riken, one of the largest Japanese research institutions.

The A64FX CPU, developed by Fujitsu, will be the first processor to feature the Scalable Vector Extension (SVE), an extension of Armv8-A instruction set designed specifically for supercomputing architectures.

It offers a number of features, including broad utility supporting a wide range of applications, massive parallelization through the Tofu interconnect, low power consumption, and mainframe-class reliability.

Fujitsu reported in August that the processor would be capable of delivering a peak double precision (64 bit) floating point performance of over 2.7 Tflops, with a computational throughput twice that for single precision (32 bit), and four times that amount for half precision (16 bit).

Trouble at the top

Intel has been seen to struggle somewhat in recent months, as it has been reported that the next generation of its processors has been delayed due to supply issues and difficulty in the 10nm fabrication processes.

The topic was addressed in August by Intel’s interim CEO Bob Swan, who reported healthy growth figures from the previous six months but also mentioned supply struggles and record investment processor development.

The surprising return to PC TAM growth has put pressure on our factory network. We’re prioritizing the production of Intel Xeon and Intel Core processors so that collectively we can serve the high-performance segments of the market. That said, supply is undoubtedly tight, particularly at the entry-level of the PC market. We continue to believe we will have at least the supply to meet the full-year revenue outlook we announced in July, which was $4.5 billion higher than our January expectations,” said Swan.

Swan stated that the 10nm fabrication process was moving along with increased yields and volume production was planned for 2019: ‘We are investing a record $15 billion in capital expenditures in 2018, up approximately $1 billion from the beginning of the year. We’re putting that $1 billion into our 14nm manufacturing sites in Oregon, Arizona, Ireland and Israel. This capital, along with other efficiencies, is increasing our supply to respond to your increased demand.’

While Intel is undoubtedly the king of the hill when it comes to HPC processors – with more than 90 per cent of the Top500 using Intel-based technologies – the advances made by other companies, such as AMD, the re-introduction of IBM and the maturing Arm ecosystem are all factors that mean that Intel faces stiffer competition than it has for a decade.

The Rise of AMD

The company had success in the headlines at the end of 2017 when the new range of server products was released but, as Greg Gibby, senior product manager of data centre products at AMD notes, he expects the company will begin to see some momentum as several ‘significant wins’ have already been completed.

Microsoft has announced several cloud services that make use of AMD CPUs and the two socket products are also being deployed by Chinese companies such as Tencent for cloud-based services and Baidu has adopted both CPUs and GPUs from AMD to drive its machine learning and cloud workloads.

AMD is generating huge revenue from its console partnerships with Sony and Microsoft.

While these custom CPUs do not directly impact HPC technology, the revenue provided valuable time for AMD to get its server products ready. In 2018 the server line-up has been successful and AMD is rumored to announce 7nm products next year. If this comes to fruition AMD could further bolster its potential to compete in the HPC market.

Gibby also noted that as performance is a key factor for many HPC users, it is important to get these products in front of the HPC user community.

He said: “I believe that as we get customers testing the EPYC platform on their workloads, they see the significant performance advantages that EPYC brings to the market. I think that will provide a natural follow-through of us gaining share in that space.”

One thing that could drive adoption of AMD products could be the memory bandwidth improvements which were a key feature of AMD when developing the EPYC CPUs. Memory bandwidth has long been a potential bottleneck for HPC applications, but this has become much more acute in recent years.

In a recent interview with Scientific Computing World, Jack Wells, director of Science at Oak Ridge National Laboratory noted it as the number one user requirement when surveying the Oak Ridge HPC users.

This was the first time that memory bandwidth had replaced peak node flops in the user requirements for this centre.

While AMD was designing the next generation of its server-based CPU line, it took clear steps to design a processor that could meet the demands of modern workloads.

Gibby noted that the CPU was not just designed to increase floating point performance, as there were key bottlenecks that the company identified, such as memory bandwidth that needed to be addressed.

Memory bandwidth was one of the key topics we looked at, so we put in eight memory channels on each socket,” said Gibby. “So in a dual socket system, you have 16 channels of memory, which gives really good memory bandwidth to keep the data moving in and out of the core.”

“The other thing is on the I/O side. When you look at HPC specifically, you are looking at clusters with a lot of dependency on interconnects, whether it be InfiniBand or some other fabric.”

“A lot of the time you have GPU acceleration in there as well, so we wanted to make sure that we had the I/O bandwidth to support this.”

This story appears here as part of a cross-publishing agreement with Scientific Computing World.

Sign up for our insideHPC Newsletter