Over at the Xcelerit Blog, Jörg Lotze and Hicham Lahlou write that code portability is the key to success in a hybrid computing world with so many available processing architectures.
Therefore, often compromises are taken: typically easy maintenance is favoured and performance is sacrificed. That is, the code is not optimised for a particular platform and developed for a standard CPU processor, as maintaining code bases for different accelerator processors is a difficult task and the benefit is not known beforehand or does not justify the effort. The best solution however would be a single code base that is easy to maintain, written in such a way that it can run on a wide variety of hardware platforms – for example using the Xcelerit SDK. This allows to exploit hybrid hardware configurations to the best advantage and is portable to future platforms.
In retrospect, Roadrunner could be viewed as a something of a design cul-de-sac, created by the artificial goal of the petaflop milestone. But it’s notable that even in the contrived race to a quadrillion flops, something of worth endured. Although the PowerXCell 8i was a commercial dead end, x86/accelerator combo servers took off and are now sold by every HPC system vendor, IBM included. For the time being, accelerators offer the only commodity-based technology that delivers multi-petaflops of supercomputing in reasonable power envelopes, not to mention tiny systems with multi-teraflops capability. The energy efficiency of these accelerators, compared to standard processors, is driving the technology into mainstream HPC and is stretching the number of FLOPS that can be squeezed into a datacenter or into a deskside cluster.
Tornado Supercomputer. (Pictured from left to right): Oleg Aladyshev (JSCC RAS), Jack Dongarra, Pavel Telegin (JSCC RAS) Alexey Ovsyannikov (JSCC RAS) Boris Shabanov (Deputy Director, JSCC RAS)
In a recent trip to Russia, renowned HPC expert Jack Dongarra visited two of Europe’s top Intel Xeon Phi supercomputer sites deployed by RSC Group, the Russian leading innovative HPC solutions builder. As the first Xeon Phi supercomputers outside of the USA being already ranked by Top500 and Green500, the systems are deployed at South Ural State University (SUSU) and Joint Supercomputer Center of Russian Academy of Science (JSCC RAS).
Both SUSU and JSCC RAS are state of the art high performance computing centers with competent staff running the highly ranked Top500 and Green500 powerful and energy efficient supercomputers,” said Jack Dongarra. “The facilities both use RSC Tornado based systems with innovative liquid cooling and newest Intel Xeon Phi coprocessors which provide impressive high performance capabilities and energy efficient solutions to solve very demanding science research and engineering problems.”
Dongarra was very impressed by high level of energy efficiency and world record computing (up to 181 TFLOPS per rack) and power (up to 100 kW per rack) density while having a very small footprint because of RSC Tornado liquid cooling technology implemented in the those both Russian projects.
This is a very simple and economical way to do it – in terms of the used space and power – which provides a good environment for the computer systems as well as for the people who take care of it. I see here a rather small room being equipped by a very powerful supercomputing system. I think this is a good sign of the well done engineering and planning have gone into construction of this computing facility.”
In this video from Moabcon 2013, Troy Baer presents: NICS, Adaptive Computing, and Intel: Leadership in HPC.
An Appro Xtreme-X Supercomputer named Beacon, deployed by the National Institute for Computational Sciences (NICS) of the University of Tennessee, tops the current Green500 list, which ranks the world’s fastest supercomputers based on their power efficiency. To earn its number-one ranking, the supercomputer employed Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors to produce 112.2 trillion calculations per second using only 44.89 kW of power, resulting in world-record efficiency of 2.499 billion floating point operations per second per watt.”
This week Mellanox announced that its end-to-end FDR InfiniBand technology is powering the Stampede supercomputer at the TACC. As the most powerful supercomputing system in the NSF XSEDE program, the 10 Petaflop Stampede system integrates thousands of Dell servers and Intel Xeon Phi coprocessors with Mellanox FDR 56Gb/s InfiniBand SwitchX based switches and ConnectX-3 adapter cards.
The InfiniBand network was easy to deploy and delivers incredible application performance on a consistent basis,” said Tommy Minyard , director of Advanced Computing Systems, TACC. “Utilizing Mellanox FDR 56Gb/s InfiniBand provides us with extremely scalable, high performance — a critical element as Stampede is designed to support hundreds of computationally- and data-intensive science applications from around the United States and the world.”
Stampede supports national scientific research into weather forecasting, climate modeling, drug discovery and energy exploration and production. Read the Full Story.
The book benefits software engineers, scientific researchers, and high performance and supercomputing developers in need of high-performance computing resources, by:
Providing a guide to exploiting the parallel power of the Intel Xeon Phi coprocessor for high-performance computing
Presenting best practices for portable, high-performance computing and a familiar and proven threaded, scalar-vector programming model
Including simple but informative code examples that explain the unique aspects of this new highly parallel and high performance computational product
Covering wide vectors, many cores, many threads and high bandwidth cache/memory architecture
I got my hands on a preliminary copy of the book back in November at SC12, and I can tell you that Jim and James did a great job.
The book release coincides with the formal dedication of the Stampede supercomputer at the Texas Advanced Computing Center in Austin. Stampede is currently ranked number seven on TOP500, with over 6400 Intel Xeon Phi coprocessors. Jeffers and Reinders have today committed several hundred books to support TACC’s training efforts for Stampede.
Today Penguin Computing announced the availability of the Relion 2808GT, a high-density server platform that supports eight GPUs or coprocessors in only 2U. Designed for scientific and engineering applications, the Relion 2808GT is tailor made for popular codes such as Matlab, Amber and Abaqus.
Penguin has been delivering integrated GPU computing clusters since the version 1.0 of this technology,” said CEO Charles Wuischpard. “The new Relion 2808GT platform in conjunction with the latest GPU and coprocessor technology delivers unprecedented levels of performance. The Relion 2808GT enables our HPC customers to further accelerate their research by shortening the time to result for their simulations.”
In terms of computational density, a fully configured server with eight NVidia K20 GPUs can achieve 28 TFLOPs of single precision floating point performance.
In this video from the HPC Advisory Council Switzerland Conference, Sadaf Alam from the Swiss Supercomputing Center presents: Direct MPI from NVIDIA Tesla and Intel Xeon Phi Accelerator Memories on an InfiniBand Cluster.
Today ClusterVision announced the installation of a 200 Teraflop supercomputer a the University of Paderborn. With 614 compute nodes and 10,000 cores, the hybrid system will run a wide range of commercial and open source HPC applications in technology and science. As a hybrid system, the supercomputer also includes 32 NVIDIA K20 GPUs and 8 Intel Xeon Phi coprocessors, providing an additional 40 Teraflops of compute power.
This system is a powerful compute resource for all researchers in the region of East Westphalia and Lippe, and our partners in Germany and Europe,” Prof.Dr. Holger Karl, head of the PC2 board.
With a system interconnect powered by Mellanox QDR InfiniBand, the Paderborn cluster uses Dell PowerVault MD3200 storage components powered by the FraunhoferFS FhGFS the parallel file- system. Read the Full Story.
Stampede is one of the largest computing systems in the world for open science research. Stampede system components are connected via a fat-tree, FDR InfiniBand interconnect. One hundred and sixty compute racks house compute nodes with dual, eight-core sockets, and feature the new Intel Xeon Phi coprocessors. Additional racks house login, I/O, big-memory, and general hardware management nodes. Each compute node is provisioned with local storage. A high-speed Lustre file system is backed by 76 I/O servers.