DEEP Project Unveils 500 Teraflop Prototype

Print Friendly, PDF & Email
DEEP Cluster nodes with front-panel connectors for the Ethernet-based service network. Power, Infiniband® and liquid cooling connectors are embedded into the backplane for minimal maintenance overhead.

DEEP Cluster nodes with front-panel connectors for the Ethernet-based service network. Power, Infiniband, and liquid cooling connectors are embedded into the backplane for minimal maintenance overhead.

The EU-funded DEEP Project has unveiled their innovative HPC platform: a 500 TFlop/s prototype system that implements a Cluster-Booster concept that has a lot in common with a turbocharged engine. The prototype operates with a full system software stack and programming environment engineered for performance and ease of use.

At first DEEP was just an idea,” said Prof. Dr. Thomas Lippert, Head of Jülich Supercomputing Centre and Scientific Coordinator of the DEEP project. “A group of the most competent, dedicated and enthusiastic scientists and engineers from all over Europe, strongly supported by the European Commission, breathed life into this idea. The companies, research institutes and universities behind the consortium can all be proud of having created a unique system, which is both most generally applicable and also unimaginably scalable. The DEEP Cluster-Booster concept will become part of the future of supercomputing,”

The DEEP system achieves the highest density and energy efficiency due to Eurotech’s Aurora technology, while it showcases the EXTOLL HPC interconnect, and also leverages Intel multi- and many-core processors. Porting and optimization of applications is facilitated by adherence to standards (MPI and OpenMP), and by extending the task-based OmpSs model developed by Barcelona Supercomputing Center (BSC). ParaStation MPI, provided as part of ParTec’s ParaStation ClusterSuite, has been turned into a Global MPI, the key system software component linking Cluster and Booster. The system is located at Jülich Supercomputing Centre (JSC) and is fully integrated with the hardware and software infrastructure on site. Initial application results clearly show the performance and efficiency potential of the system, and JSC plans to operate the machine for several years to come and make it available to external users.

Scalability, energy efficiency, programmability, and manageability are major challenges on the way to building exascale-class supercomputers. To address them, the collaborative DEEP R&D project implemented the novel Cluster-Booster concept, a heterogeneous architecture that enables applications to always run at the right level of concurrency: highly scalable code parts profit from the throughput of the many-core Booster, while code parts with limited scalability benefit from the high per-thread performance of a conventional Cluster.

DEEP Cluster rack installed at Jülich Supercomputing Centre

DEEP Cluster rack installed at Jülich Supercomputing Centre

The final DEEP system is up and running at JSC: with a peak performance of 500 TFlop/s, it uses Eurotech’s Aurora technology to achieve tight packaging (the whole system uses less than two racks) and high energy efficiency through direct liquid cooling. The DEEP Booster tightly integrates 384 Intel Xeon Phi nodes communicating over a 3D high-performance torus network based on Extoll technology.

The Booster was designed by Eurotech in close collaboration with Intel, Heidelberg University and Leibniz Supercomputing Centre under the guidance of Intel within the ExaCluster Lab at Jülich Supercomputing Centre. Furthermore, partner LRZ developed DEEP’s novel RAS architecture, providing advanced monitoring tools that give a holistic picture of the system status with a level of detail not previously seen in HPC machines.

To mask the relative complexity of the Cluster-Booster architecture, DEEP developed a complete software stack that features an easy-to-use and familiar programming environment for application developers, and can achieve an optimal match between hardware and application characteristics. A global MPI implementation covers both Cluster and Booster, and is based on the fully MPI-3-compliant ParaStation MPI by the Munich-based software company ParTec. On top of that, the task-based OmpSs model developed by BSC now supports the DEEP collective offload model for highly parallel kernels that use MPI. Both layers are available on a wide variety of platforms.

To prove the concept, six real-world HPC applications from science and industry were optimized for the DEEP project prototype. The work resulted in modernized versions of the codes, which are now ready to achieve high performance across a wide range of architectures. Initial results on the final DEEP system show the performance potential and clearly demonstrate the advantages of its architecture, such as its high flexibility and efficiency in using system resources.

In this video from ISC 2015, Estela Suarez from the DEEP project describes the progress her team has made in the past year.

The prototype system at JSC will be made accessible to HPC application developers outside the DEEP project. Interested researchers should contact the DEEP Project Management Team via Additionally, JSC plans to complement the current JURECA system with a 10 Petaflop Booster machine in 2016/2017.

At SC15, the DEEP will present its work at the joint booth #197 of the European Exascale projects.

See our complete coverage of SC15 * Download the Print n’ Fly Guide to SC15 in Austin