EPFL and ClusterVision define, design, and benchmark Deneb.
École Polytechnique Fédérale de Lausanne (EPFL) is one of the premier technical universities in Europe. Students and researchers come from around the world to EPFL to work on a wide range of projects leading to discoveries that are changing how scientists understand diseases, develop new materials, look at the stars, design and construct stronger and lighter aircraft, and grasp the deepest operations of the human brain. Many of these projects require supercomputing resources to test algorithms, analyze data, and simulate and visualize models.
The Scientific IT and Application Support (SCITAS) unit at EPFL is responsible for understanding the computing needs of researchers and students, searching for the best solutions, implementing and deploying the resources in an optimized manner, and supporting them throughout their life. EPFL clusters run a large variety of applications: from in-house developed codes, to open source applications (for example QuantumEspresso), to which EPFL researchers often contribute. The clusters are also used to run commercial codes. The different applications can be CPU, Memory or I/O intensive. Most of the applications use MPI as a parallelization library.
We add new clusters to the supercomputing facility every few years. Our most recent acquisition, Deneb, will be the largest, most powerful system when completed—starting with 376 nodes and scaling to 512 nodes—built on Intel® Xeon® processors. The cluster has just gone into production in January 2015. Deneb will join two other Intel-based clusters at EPFL—Castor and Bellatrix.
In designing Deneb, we looked at several criteria. In addition to the computational requirements, Deneb needed to support a large pool of job types that could consume from just a few cores to a big proportion of the machine. That’s because, while the research done at EPFL is cutting edge, we do not only run exotic applications. We run classic HPC problems, too, like fluid dynamics and cryptography, and we also have a number of different and interesting computational disciplines, leading to an explosion in the diversity of the resources researchers and students require. This means that unlike a workstation or a supercomputer, which is highly optimized to run certain kinds of codes, we needed Deneb to be easily adaptable for a variety of applications around a general, predefined configuration. Thus, in addition to pretty standard academic-driven criteria, like purchase and operating costs over the lifetime, plus performance, and power efficiency, flexibility was a key requirement in our research and selection process.
SCITAS evaluated different hardware configurations. We worked with ClusterVision because of their expertise in HPC to find the right combination of components that would meet our needs. They defined and configured the system, and then they ran the required benchmarks, which SCITAS later reproduced on Deneb. Their work was instrumental in achieving our goals. What we were particularly interested in from the benchmarks was time to solution and power consumption for the solution as HPC systems account for the largest power consumer in our Datacenter.
Since we mainly used MPI-based codes, and sought high performance and low power consumption, Christopher Huggins, ClusterVision’s Commercial Director, and his design team recommended both the Intel® Xeon® processor E5-2650 v2 and the Intel® True Scale Fabric. The processor is extremely fast and power efficient for the kinds of projects EPFL researchers and students run, and the Intel Fabric is highly tuned for MPI. The ClusterVision team showed us how the adapter card is designed specifically to boost message passing across the network. ClusterVision benchmarked, QE, CPMD, a finite element code, and GEAR, an astrophysics code based on GADGET2 (N-body simulations).
The recommended configuration performed very well on memory throughput. As usual, we did not need to boost processor frequencies to achieve the results we wanted, which means it could perform well on power consumption also. We found that the cluster runs well under different configurations, fitting our need for flexibility across different job types. The system met all of our demands across performance, power consumption, and flexibility.
By Vittoria Rezzonico, Executive Director SCITAS, École Polytechnique Fédérale de Lausanne.