Doug Eadline looks at this question in a post at Cluster Connection from yesterday. In the post he gives a lucid description of the difference between user and kernel space communications, and why that difference impacts performance
When interconnects are used in HPC the best performance comes from a “user space” mode. Communication over a network normally takes place through the kernel. (i.e. the kernel manages, and in a sense guarantees, data will get to where it is supposed to go). This communication path, however, requires memory to be copied from the users program space to a kernel buffer. The kernel then manages the communication. On the receiver node, the kernel will accept the data and place it in a kernel buffer. The buffer is then copied to the users program space. The excess copying often adds to the latency for a given network. In addition, the kernel must process the TCP/IP stack for each communication. For applications that require low latency, the extra copying from user program space to kernel buffer on the sending node and then from kernel buffer to user program space on the receiving node can be very inefficient.
As you probably already know, the canonical example of a user space interconnect today is InfiniBand (Myrinet is in there too, though less popular). Ethernet is the canonical example of a kernel space interconnect. Developers and applications engineers running apps wouldn’t normally be required to care about the particulars of why this works, but it’s good knowledge to put in the old back pocket, and Doug does a fine job explaining it.
Van Jacobsen did an excellent presentation at LCA2006 in Dunedin, NZ, on optimising the Linux TCP stack and ended up with a system that was limited by the bandwidth of RAM, not the speed of the CPU. For more see the LWN article on the talk – http://lwn.net/Articles/169961/ – as well as VJ’s original slides – http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf
s/Jacobsen/Jacobson/
D’oh – I always do that.. 🙁