Multi-core systems, in combination with specialized co-processors for hefty tasks, are hailed as the future of high-performance computing. In a bus-based architecture, the environment is an SMP in which all of the memory is accessible by all of the processors in the same amount of time. This setup works well for a few cores, but has tremendous trouble for the dozens of cores promised in the future. The resource contention in an SMP is not a new issue; the solution of yesterday is the same for today: NUMA.
In a NUMA architecture, memory regions are aligned with processors, so that some memory accesses take longer than other memory accesses. Of course this setup brings other headaches, such as cache coherence (which really needs to be performed directly in hardware for performance reasons) and data partitioning choices (so that most accesses are for local memory rather than remote). These downsides are usually accepted simply because NUMA is the only way to achieve scalability in systems with many multiple processors, and now many multiple cores.
This is a key difference between AMD’s and Intel’s respective strategies. AMD has embraced the NUMA architecture and is proceeding with HyperTransport. Intel may do something similar in the future, but for now is sticking with SMP by using PCI. Because of AMD’s approach, there are some startups that are creating Opteron computers that rely heavily on HyperTransport. (Fabric7, PANTA Systems, and Liquid Computing also share the fact that they embrace virtualization, which is another blog post altogether.)
So the answer for dealing with bus saturation is to not have a bus at all. That is, multi-core systems require a direct connect architecture. The original vision of InfiniBand was to achieve this, though the bloated spec and the delayed product launches quickly dashed the Trade Association’s plans for world domination. Perhaps HyperTransport and other less ambitious technologies will be the saviour for multi-core computers.