Over the email transom today, news that NVIDIA has released the latest version of its CUDA toolkit. There are a lot of changes; here are a few from the list
- The CUFFT Library now supports double-precision transforms and includes significant performance improvements for single-precision transforms as well.
- The CUDA-GDB hardware debugger and CUDA Visual Profiler are now included in the CUDA Toolkit installer, and the CUDA-GDB debugger is now available for all supported Linux distros. (see below)
- Each GPU in an SLI group is now enumerated individually, so compute applications can now take advantage of multi-GPU performance even when SLI is enabled for graphics.
- The 64-bit versions of the CUDA Toolkit now support compiling 32-bit applications. Please note that the installation location of the libraries has changed, so developers on 64-bit Linux must update their LD_LIBRARY_PATH to contain either /usr/local/cuda/lib or /usr/local/cuda/lib64.
- New support for fp16 <-> fp32 conversion intrinsics allows storage of data in fp16 format with computation in fp32. Use of fp16 format is ideal for applications that require higher numerical range than 16-bit integer but less precision than fp32 and reduces memory space and bandwidth consumption.