Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Numascale Sets New World Record on STREAM Benchmark

numascaleToday Numascale announced record-breaking results from a shared memory system running the McCalpin STREAM Benchmark, a synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels. Numascale’s cache coherent shared memory system, which was targeted for big data analytics, reached 10.06 TBytes/second for the Scale function. This feat ranked 53% higher than the next most scalable system on the list, which was only able to achieve 6.59 TBytes/second.

Numascale’s record-breaking system is the first part of a large cloud computing install at a North American customer data center facility for the analytics and simulation of sensor data combined with historical data. The system is being used to run analytic models that simulate complex dynamic behavior in a certain supply chain. Its data sets are large and the model uses both historical data as well as close to real-time information to predict behavior. The size of the data sets requires large memory short access times in order to be able to complete computations within deadlines.

The customer data center’s analysis evaluates location placement, megawatt sizing, and energy services mix in order to determine the greatest optimization and efficiency gains from the integration of banks that store and deliver energy to an electric grid. In similar fashion, Numascale’s technology has numerous Smart City applications, such as traffic analysis, where 24×7 real-time streaming data from thousand of sensors aids decisions and actions that need to be made in real time.

To run all calculations compiled from disparate data sources in a timely manner — both structured and unstructured — requires significant computing power and a large shared memory. Numascale’s STREAM results indicate that the total bandwidth of the system is capable of supporting large parallel workloads. The STREAM benchmark is specifically designed to test datasets much larger than the available cache on any given system, so its results indicate, to some degree, of the performance of very large, vector-style applications.

Numascale’s system consists of 108 Supermicro 1U servers connected in a 3D torus via their NumaConnect™ Interconnect technology. Three cabinets with 36 servers apiece were used in a 6x6x3 topology. Each server has 48 cores in three AMD Opteron™ 6386 CPUs and 192 GBytes memory, providing a single system image and 20.7 TBytes to all 5,184 cores. The system was designed to meet requirements for “very large memory” hardware solutions running a standard single image Linux OS on commodity x86-based servers.

NumaConnect enables scalable server computer systems to be built from commodity components at cluster prices, while providing high performance shared memory programming capabilities. The Interconnect technology eliminates the difficulty of MPI coding for big data problems and typically increases programmer productivity.

This alternative represents a compelling solution for scientists who currently work with shared memory codes on x86 desktops and laptops,” said Einar Rustad, CTO of Numascale. “These users can now scale up their data sets without any extra effort within a familiar, standard Linux OS environment. With NumaConnect, system administration is identical to that of a single server because there are no separate node images to maintain and distribute.”

In Numascale’s record-breaking system, NumaConnect provides a total physical address space of up to 256 TBytes of system-wide shared memory. It does so using cache coherency logic with a directory-based protocol that scales to 4096 nodes, providing 196,608 cores. In running this STREAM benchmark Numascale’s system did not use all of its cores, as it is a better utilization of memory channels to let one core run each memory controller, thus avoiding arbitration between different cores and providing optimum memory bandwidth.

For this install, Numscale will deliver a training session to teach best practice software design methods that take full advantage of their unique architecture. The company has signed a development agreement whereby Numascale will co-develop future software solutions with the data center.

Visit Numascale in booth 1124 at ISC 2015 to see a demo of Numascale’s Smart City solution.

See the Numascale Slidecast on this BenchmarkSee our Full Coverage of ISC 2015

Sign up for our insideHPC Newsletter

Resource Links: