In this slidecast, Einar Rustad from Numascale describes how the company achieved a world-record on the McCalpin STREAM benchmark using their innovative scale-out to scale-up architecture. The benchmark measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels.
Numascale’s cache coherent shared memory system, which was targeted for big data analytics, reached 10.06 TBytes/second for the Scale function. This feat ranked 53% higher than the next most scalable system on the list, which was only able to achieve 6.59 TBytes/second.
Numascale’s system consists of 108 Supermicro 1U servers connected in a 3D torus via their NumaConnect Interconnect technology. Three cabinets with 36 servers apiece were used in a 6x6x3 topology. Each server has 48 cores in three AMD Opteron 6386 CPUs and 192 GBytes memory, providing a single system image and 20.7 TBytes to all 5,184 cores. The system was designed to meet requirements for “very large memory” hardware solutions running a standard single image Linux OS on commodity x86-based servers.