Slidecast: Nvidia/IBM to Build Two Coral 100+ Petaflop Supercomputers in 2017

Print Friendly, PDF & Email

In this slidecast, Sumit Gupta from Nvidia describes the company’s big win with IBM for two DOE supercomputers for deployment in 2017. The two machines will feature IBM POWER9 processors coupled with Nvidia’s future Volta GPU technology. NVLink will be a critical piece of the architecture as well, along with a system interconnect powered by Mellanox.

coral

IBM today announced that the U.S. Department of Energy has awarded IBM contracts valued at over $300 million to develop and deliver the world’s most advanced “data centric” supercomputing systems at Lawrence Livermore and Oak Ridge National Laboratories to advance innovation and discovery in science, engineering and national security. IBM’s new systems using a “data centric” approach puts computing power everywhere data resides, minimizing data in motion and energy consumption.

The rapid growth and emerging importance of managing Big Data accelerates the opportunity for new discovery while at the same time compounds the challenge scientists face. The world is generating more than 2.5 billion gigabytes of data every day (equivalent to 250 million football fields full of books), requiring entirely new approaches to supercomputing.

The current approach to computing presumes a model of data repeatedly moving back and forth from storage to processor in order to analyze and access data insights. However, this approach becomes unsustainable with the onslaught of Big Data because of the significant amount of time and energy that massive and frequent data movement entails. The popular practice of putting design emphasis solely on faster microprocessors becomes progressively more untenable because the computing infrastructure is dominated by data movement and data management.

To address this issue, for the past five years IBM researchers have pioneered a new “data centric” approach to systems – an architecture that embeds compute power everywhere data resides in the system, allowing for a convergence of analytics, modeling, visualization, and simulation, driving new insights at incredible speeds.

chart

IBM Systems Deliver Speed and Energy Efficiency to Labs

The Laboratories anticipate that the new IBM supercomputers will be among the fastest and most energy efficient systems following the installation thanks to the innovative Data Centric approach. The systems at each laboratory are expected to offer five to 10 times better performance on commercial and high-performance computing applications compared to the current systems at the labs, and will be more than five times more energy efficient.

These OpenPOWER-based systems will leverage the new Data Centric computing architecture to provide leading edge, cost-effective modeling, simulation, applications, and analytics on Big Data. The “Sierra” supercomputer at Lawrence Livermore and “Summit” supercomputer at Oak Ridge will each have a peak performance well in excess of 100 petaflops balanced with more than five petabytes of dynamic and flash memory to help accelerate the performance of data centric applications. The systems will be capable of moving data, when necessary, at more than 17 petabytes per second (which is equivalent to moving over 100 billion photos on Facebook in a second) to speed time to insights.

The national laboratories offer researchers from academia, government, and industry access to time on their open computers to address grand challenges in science and engineering. Traditionally, the labs’ computers have been optimized to handle hardcore scientific problem solving, using techniques such as modeling and simulation. Increasingly, researchers are seeking help with projects in diverse domains such as healthcare, genomics, economics, financial systems, social behavior and visualization of large and complex data sets. Therefore, the computing systems need to help manage and sort data as well as extract useful information to help solve the world’s toughest problems.

This Data Centric architecture pioneered by IBM will be transformational for science and national security applications as well as for industries such as healthcare, manufacturing, engineering, oil and gas. The Sierra and Summit systems will be used for the most mission-critical applications and represent the next major phase in the U.S. Department of Energy’s scientific computing roadmap to exascale computing.

Open Approach Leveraging OpenPOWER

The capability to generate, access, manage and operate on ever larger amounts and varieties of data requires changing the nature of traditional computing to be built on open technology platforms. Organizations need to look at the data challenge holistically – from systems design to how decisions will be made. This means looking at the data from when it was “born” to how it goes through the lifecycle of a solution driven workflow to create an insight – from data preparation, to processing, to visualization, and back through multiple iterations.

The incorporation of OpenPOWER technologies into a modular integrated system will enable Lawrence Livermore and Oak Ridge to customize the Sierra and Summit system configurations for their specific needs.

Working with IBM, NVIDIA developed the advanced NVIDIA NVLink interconnect technology, which will enable CPUs and GPUs to exchange data five to 12 times faster than they can today. NVIDIA NVLink will be integrated into IBM POWER CPUs and next-generation NVIDIA GPUs based on the NVIDIA Volta architecture, allowing Sierra and Summit to achieve unprecedented performance levels. With Mellanox, IBM is implementing a state-of-the-art interconnect that incorporates built-in intelligence, to improve data handling.

Today’s announcement marks a dramatic departure from traditional supercomputing approaches that are no longer viable as data grows at enormous rates. IBM’s Data Centric approach is a new paradigm in computing, marking the future of open computing platforms and capable of addressing the growing rates of data,” said Tom Rosamilia, senior vice president, IBM Systems and Technology Group. “The beauty of the systems being developed for Lawrence Livermore and Oak Ridge is that the core technologies are available today to organizations of many sizes across many industries.”

IBM is delivering Data Centric technologies to clients today, including IBM POWER8, IBM Elastic Storage Server, IBM Elastic Storage software (based on General Parallel File System technology), and IBM Platform Computing software.

IBM Research will work with Lawrence Livermore and Oak Ridge on scientific collaborations centered on these systems and help develop tools and technologies to optimize codes to achieve the best performance on the acquired systems.

While leveraging current Power Systems and OpenPOWER technologies today to start the required programming efforts, these new Data Centric systems are slated to be deployed and installed in the labs by 2017-2018.

View the SlidesDownload the MP3 * Download the Nvidia whitepaperSubscribe on iTunes * Subscribe to RSS

Sign up for our insideHPC Newsletter.