It’s been almost a month since SC10 ended, and I’m still catching up on all the interviews I did at the show. And as I go through my notes, one theme keeps popping up: tackling complexity.
Every season, HPC users have contend more nodes, cores, and threads if they want to scale application performance. And while the programming tools are coming along, parallel programming at this level is not the forte of many scientists who just want to get their work done.
Enter Numascale, the Norwegian technology company that enables you to build an SMP out of commodity Opteron servers using their ccNuma and Numa low latency shared memory interconnect. It may sound complicated, but according to Numascale’s Einar Rustad, it’s all about programmer productivity.
Effectively, what we deliver to the end user is an SMP with scalable, cache-coherent, shared memory. With a single system image, it’s much easier for users to program, analyze, and optimize their code. In my experience, MPI programs tend to have twice the number of lines per code than are needed on a shared memory machine. You need PhD-level guys to do that kind of message-passing code, but undergrads can easily handle coding for SMP.”
Rustad went on to day that another advantage of the Numascale SMP solution is that it can run any OS for the x86 architecture, including unmodified distributions of Linux. From the OS’s point of view, its just a bigger machine. How big? With Numascale, you can build a system with up to 256 Tbytes of DRAM. And to speed applications by optimizing data locality, each Numascale card has up to 4 Gbytes of cache for storing remote data. This cache-based solution offers significant performance advantages over software-based solutions.
Call me impressed. Check out Doug Eadline’s whitepaper on Numascale technology for a closer look.