Advancing HPC with Collaboration & Co-design

Print Friendly, PDF & Email

In this special guest feature from Scientific Computing World, Tom Wilkie reports on two US initiatives for future supercomputers, announced at the ISC in Frankfurt in July.

Kohler

The European high performance computing conference, ISC, held in Frankfurt in July was the venue for the announcement of two major US collaborations to develop next generation data-centric supercomputing. The will focus on communications and applications, with a view to helping the ‘co-design’ of future systems.

Processor technology suitable for data centric computing was also on display in Frankfurt, as Robert Roe reported in Computer processors evolve to fit new data intensive niches. But interconnects are also at a premium when heavy data loads mean intensive I/O, so Mellanox Technologies, which specializes in interconnect solutions for data centre servers and storage systems, announced a collaboration to develop a new open-source network communication framework for high-performance and data-centric applications. The work will involve the US Department of Energy’s Oak Ridge National Laboratory (ORNL), IBM, the University of Tennessee, Nvidia, and others.

At the same time, IBM along with Nvidia announced the establishment of a pair of Centers of Excellence for supercomputing – one at the US Lawrence Livermore National Laboratory and the other at ORNL. These two collaborations are in support of IBM’s part in the Coral procurement of next-generation US supercomputers: the Summit and Sierra systems to be delivered to Oak Ridge and Lawrence Livermore in 2017. The focus of this work, however, is on the applications that will run on the new machines.

Collaboration on communications

UCX

UCX

The Mellanox-led project, Unified Communication-X (UCX), will provide platform abstractions supporting various communication technologies for next generation programming models. Traditionally there have been three popular mainstream communication frameworks to support interconnect technologies and programming languages: MXM, developed by Mellanox; PAMI, developed by IBM; and UCCS, developed by ORNL, the University of Houston, and the University of Tennessee. UCX will unify the strengths and capabilities of each of these communication libraries and develop exascale programming models that are agnostic to the underlying interconnect and acceleration technology.

By providing our advancements in shared memory, MPI, and underlying network transport technologies, we can continue to advance open standards-based networking and programming models,” said Gilad Shainer, vice president of marketing at Mellanox. Jim Sexton, director of data centric systems at IBM, stressed the open-source nature of the communication framework that will be developed: “IBM is contributing key innovations from our PAMI high-performance messaging software already in use in several Top10 supercomputing systems.”

In this slidecast, Pavel Shamis from ORNL and Gilad Shainer from Mellanox announce the UCX Unified Communication X Framework.

According to George Bosilca, research director at the innovative computing laboratory, University of Tennessee, Knoxville: “The path to exascale requires programming models where communications and computations unfold together, collaborating instead of competing for the underlying resources. Providing holistic access to the hardware is a major component of any programming model or communication library.”

Applications and co-design

The two centers of excellence being established at Lawrence Livermore and at ORNL as part of the Coral procurement, will focus on end-application software to complement the development of the hardware and system software for the Summit and Sierra machines. The intention is to develop end-user application software in tandem with the development of the systems, to generate feedback between the system developers and the application writers. This linkage will ensure that the system design will deliver a machine that can run the user applications efficiently and appropriately.

The two centers will address the programming challenges of writing software for IBM’s Power processors coupled with Nvidia Tesla GPU accelerators through the Nvidia NVLink high-speed processor interconnect. Summit and Sierra will use a data-centric approach that minimizes the movement of data. But according to Michel McCoy, program director for advanced simulation and computing at Lawrence Livermore, innovation in the application software is vital in ‘making sure our facilities are prepared to take advantage of the performance of the new supercomputers. The centers bring together the people who know the science, the people who know the code, and the people who know the machines.’

Applications developed at the centers will take advantage of innovations developed via the OpenPower community of developers while developments at the centers will also benefit general-purpose OpenPower -based commercial systems.

The modeling and simulation applications span the sciences from cosmology to biophysics to astrophysics. One of Oak Ridge’s applications will focus on advancing Earth system models for climate research while another will map the Earth’s interior using big data for seismology research. Lawrence Livermore’s will develop applications for the US nuclear weapons program and other national security areas, including bio-security.

Other reports from ISC High Performance have focused on the issues of Why do smaller companies shun HPC? national policy strategies for Easing access to HPC for the SME on how A portal opens to German HPC centres, and asked Does the path to HPC for SMEs lie in the Cloud?

Robert Roe offers a respite from policy-related issues by examining how Computer processors evolve to fit new data intensive niches –a look at new developments in processor technologies on display at ISC High Performance.

This story appears here as part of a cross-publishing agreement with Scientific Computing World.

Sign up for our insideHPC Newsletter.