Today EXTOLL in Germany released its new TOURMALET high-performance network chip for HPC.
The key demands of HPC are high bandwidth, low latency, and high message rates. The TOURMALET PCI-Express gen3 x16 board shows an MPI latency of 850ns and a message rate of 75M messages per second. The message rate value is CPU-limited, while TOURMALET is designed for well above 100M msg/s.
HPC applications typically require both scalar and vector performance for different parts of a problem to be solved. Within the scope of the DEEP-Project, the Juelich Supercomputing Center has set up a tandem of a cluster (scalar) and a booster (vector) system, where EXTOLL provides the network technology for the booster part. The special features of EXTOLL’s TOURMALET enable direct attachment of the Intel XEON Phi (Knights Corner) accelerators to the network without the need for a host CPU per node.
In the successor project DEEP-ER, EXTOLL network technology will build a homogeneous network all over the cluster and the booster parts of the system. As in DEEP, applications may arbitrarily and dynamically assign any number of accelerators to any other number of CPUs. The difference here is that the translation between cluster and booster network protocols required in DEEP is obsolete, and the communication bandwidth between both sides of the systems will be increased. Additionally, the DEEP-ER booster will be built with the second generation, self-booting Intel’s Xeon Phi processors, code-named Knights Landing, which are also already supported by the EXTOLL TOURMALET board.
There are two more types of accelerators suitable for HPC applications depending on the preferred application profile: GPU- and FPGA. EXTOLL very recently managed to directly attach NVIDIA’s TESLA GPU to the network. For FPGAs, EXTOLL provides implementations and boards based on XILINX Virtex 7. This makes the EXTOLL network the first choice for HPC clusters with both scalar and vector performance and renders the following benefits to the end-user:
- Cost savings: No host CPU per node required. No central switches required.
- Speed: Superior performance indices for HPC
- Flexibility: Accelerators are attached while keeping the network topology
- Universality: All kinds of accelerators are supported
Visit EXTOLL at ISC 2016 booth# 610 in Frankfurt.