InsideTrack: NVIDIA Fermi Performance with CULA

Print Friendly, PDF & Email

Hot off the presses this morning are some real benchmarks on the latest NVIDIA Fermi gear.  We’ve all heard the technical news from the latest in silicon goodies from NVIDIA, but not a whole lot with real workloads.  We were tipped off this morning on a ‘hot off the presses’ blog post from the nice folks at EM Photonics.  They’re in the biz packaging mathematics libraries, called CULA, geared toward the NVIDIA platform.  They released some performance bits with their latest release of CULA, version 1.3a.  Now that they have their release out in the wild, they focused some engineering time on beginning to port and adapt CULA to NVIDIA’s Fermi platform.  They posted the first series of benchmarks on their company blog.

Hot off the heels of a 1.3a service release, we’ve got some brand new information on the future directions of CULA.  Today we’ll be talking about Fermi, NVIDIA’s next-generation GPU architecture that was announced in September at the GPU Technology Conference.  At that time, we shared our thoughts on the new and exciting performance we hoped Fermi would bring.  After 6 months of anticipation, we’re very proud today to debut the first performance results for CULA running on Fermi.  To our knowledge, these results are the first published double-precision performance results for Fermi running real-world code. [Kyle Spagnoli]

With only a few compiler flags and some driver upgrades, the engineers at EM Photonics were able to achieve some very tasty speedups on traditional linear algebra solvers.  Specifically, they posted numbers for LU decomposition using DGETRF and QR decomposition using DGEQRF.

As you can see, Fermi is no slouch!  We’re reporting performance gains for doubles up to 3x over the previous generation of Tesla GPUs.  It’s also very important to note that these gains are achieved with no Fermi-specific optimizations added — these are practically plug-and-play performance enhancements.  We have every expectation that with a little time and effort we can improve significantly upon these already impressive numbers.

Rest assured that the folks from EM Photonics will be tweaking the latest 1.3a release and optimizing performance for the latest in NVIDIA silicon.  Check out the original blog post here.


  1. […] week they were announcing the first Fermi performance numbers.  This week, they’re announcing their new CUDA Training […]