Over at the MPI Blog, Cisco’s Jeff Squyres writes that if you’re not using processor and memory affinity in your MPI programs, you’re likely experiencing performance degradation without even realizing it.
You can’t completely eliminate the amount of traffic that is flowing across the NUMA-node-connecting-network, but you do want to minimize it. Networking 101 tells us that, in many cases, reducing congestion and contention on network links leads to overall better performance of the fabric. The same principle is true on networks inside a server as it is for networks outside of a server. Using processor and memory affinity helps minimize all of the effects described above. Processes start and stay in a single location, and all the data they use in RAM tends to stay on the same NUMA node (thereby making it local). Caches aren’t thrashed. Well-behaved MPI implementations use local NICs (when available). Less inter-NUMA-node traffic = more efficient computation.
Read the Full Story.