What is the difference between AMD's Stream Processor and NVIDIA's GeForce 8800? (Or, is Cray's strategy the right one after all?)

AMD has announced a Stream Processor that comes from its recent acquisition of ATI. The processor is currently available on a PCI Express board and is provided with one gigabyte of dedicated memory. It also comes with the Close to Metal (CTM) interface for software developers. CTM is the target of stream programming platforms such as PeakStream and RapidMind, though its open nature allows it be targeted by in-house developers.

The Stream Processor is different from the CUDA technology in the GeForce 8800 in that the latter has cooperating cores and can therefore run multithreaded applications without stream programming. That is, AMD’s approach is a vector processor—SIMD—whereas NVIDIA’s approach is a multithreaded processor—MIMD. (To be precise, a stream processor applies a “kernel” of related instructions stored in a cache, whereas a vector processor applies a single instruction stored in a register; for our discussion, the difference is minimal.) This SIMD vs. MIMD divide also appears when comparing ClearSpeed and the Cell BE.

It is interesting to note that the offer of vector processors and multithreaded processors matches Cray’s adaptive supercomputing strategy. (Cray also offers FPGAs, which have been the focus of Celoxica and DRC.) And the CPU behind all of this is the x86; AMD’s offerings are currently being favored over Intel because of the direct connect architecture.

Cray might have the satisfaction of being right, but they still need to worry about market penetration before the smugness settles in. The other vendors have the benefit of commoditization, which is the exact force that removed Sun from being the leader in enterprise computing. Third-party OEMs have already announced the inclusion of the Stream Processor at Supercomputing this week. Can Cray keep up with that amount of volume?

One interesting side note I’d like to close with: while contemplating the SIMD and MIMD issues, I realized that the x86 vendors already have a watered-down version of both of these, namely SSE and multi-core architectures. It appears that Flynn’s taxonomy still rings true today; everyone is rushing to add these components to CPUs, either on-chip or along-side.