A new feature in Information Week looks at the subtle memory-access issues that x86 developers must consider as they work with multithreaded software, including how to avoid pipeline stalls:
When writing a parallel program for performance, first get the decomposition right. Then tune for cache usage, including avoidance of false sharing. Then concern yourself with the processor’s pipeline. Many x86 processors have deep pipelines that permit high execution rates of instructions typical of single-threaded code. The execution units reorder instructions so those waiting on memory accesses don’t block others. Deep pipelines and out-of-order execution are usually a good thing, but they make some operations relatively expensive in terms of performance.
Read the Full Story.