Last week HPCwire’s editor Michael Feldman commented on the memory wall, and what may lie on the horizon for HPC
The Nehalem processors, though, should provide some relief — if temporarily. The soon-to-be-released quad-core EP chips for two-socket servers will have integrated DDR3 memory controllers, which Intel claims will bump memory bandwidth by 300-400 percent compared to the current “Penryn” class Xeon processors. …Unfortunately, after Nehalem, Intel probably won’t be able to duplicate another memory performance increase of similar magnitude for some time. DDR4 will have perhaps twice the raw performance of DDR3, but is not expected to show up until 2012.
…GPUs are a different story though. These chips are all about data parallelism, so the memory architecture was designed for parallel throughput from the get-go. …Today, you can get an NVIDIA Tesla GPU with 4 GB of (GDDR3) memory at 102 GB/second of bandwidth. Granted this is graphics memory, so you have to deal with the lack of error correction, but at roughly three times the memory performance available to a Nehalem processor, GPUs can offer some respite from the memory wall.
Oddly, though, none of this is good news
Despite the meteoric rise of GPUs in the general-purpose computing world over the last couple of years, most HPC users are still using x86-based clusters. According to IDC, less than 10 percent of the HPC user sites they surveyed were using alternative processors (most of which, I assume, were GPUs and Cell processors), and they didn’t see those numbers changing dramatically in the near term.
But the memory wall will be unrelenting. The eight-core Nehalem EX chip is in the works and is expected to show up in the second half of 2009. At eight cores, memory-intensive apps might be a poor fit for this platform. It was at the eight-core mark that the Sandia study saw an actual decrease in performance. There’s plenty of anecdotal evidence that a variety of HPC applications are seeing declining application performance as they migrate from just two to four cores