Interview & Whitepaper: How Numascale Builds Shared Memory Clusters

Print Friendly, PDF & Email

While there are a number of ways to build shared memory clusters, Numascale’s hardware solution has matured with up to 256 Tbytes of physical address space in its scalable on-chip switch fabric. With a newly available whitepaper penned by Douglas Eadline, the company is showing impressive performance on OpenMP and MPI benchmarks. To learn more, we caught up Einer Rustad from Numascale.

insideHPC: Who is Numascale and who do you help?

Einer Rustad

Einer Rustad

Einer Rustad: Numascale provides technology and products to make large scalable shared memory systems by interconnecting standard servers.

insideHPC: What is the objective of the PRACE study that is using the prototype NumaConnect cluster at University of Oslo in Norway?

Einer Rustad: The objective of PRACE is to evaluate, test and explore emerging technologies for high performance computing.

insideHPC: How large can shared memory scale to in a Numascale cluster?

Einer Rustad: It can scale to 256TBytes of shared, cache coherent memory.

insideHPC: How do you handle cache-coherency?

Einer Rustad: The Numascale cache coherency is handled in hardware using the same 64-byte cache line granularity as the x86 processors through a scalable directory based cache coherence protocol.

insideHPC: What is the advantage of a hardware-based solution for aggregating server resources vs. software solutions?

Einer Rustad: The hardware solution requires no extra buffer space in main memory; it operates on 64 -bytes cache lines granularity as opposed to page level (4k or 2MB per page). This reduces the probablity of false sharing by a factor of 64 for 4k page size and 32768 for 2MB page size. A software based solution will rewuire a page fault for every access to a page outside the memory of the machine where the process resides and the page with the target address must be moved to this machine before the process can resume. Real random access therefore poses a challenge. It is also challenging to handle semaphores on a page level granularity for the same reason and semaphores are required for processes that exchange data in multiprocessing programs to enforce process synchronization.


insideHPC: How can our readers access Numascale resources and try this technology out for themselves?

Einer Rustad: Numascale has set ut a demo system where users can apply for time and be allowed to try out their own programs and benchmarks under supervision of Numascale personnel.

For more information on the Numascale Big Memory solution, Download the Whitepaper.