Today Nvidia released CUDA 6.5, the “latest version of the world’s most pervasive parallel computing platform and programming model.” Available as a free download, version 6.5 of the CUDA Toolkit brings the power of GPU-accelerated computing for the first time to 64-bit ARM platforms. It also enables a newly expanded range of efficient, high-performance computing options to accelerate compute-intensive HPC and enterprise data center workloads.
CUDA 6.5 provides programmers with a robust, easy-to-use platform to develop advanced scientific, engineering, mobile and HPC applications on GPU-accelerated ARM and x86 CPU-based systems.
Additional performance and productivity features of the CUDA 6.5 platform include:
- Support for Microsoft Visual Studio 2013 – Expands host compiler support to include Microsoft Visual Studio 2013 for Windows.
- cuFFT callbacks capability – Delivers higher performance custom processing on input or output data by enabling programmers to specify callback functions that manipulate data in GPU memory before and during FFT processing.
- Improved debugging for CUDA FORTRAN applications (preview) – Includes new debugging support for FORTRAN arrays (Linux only), improved source-to-assembly code correlation, and improved documentation.
- Application Replay mode – Enables faster analysis of complex scenarios using multiple hardware counters.
- Updated CUDA Occupancy Calculator API – Frees programmer from having to manually configure kernel launches for each GPU architecture.
- New “nvprune” utility – New utility that prunes object files and libraries to only contain device code needed for the specified target architectures, reducing application size and improving load-time performance.
- BSR sparse matrix format in cuSPARSE routines – Support for Block Sparse Row matrix format added to more sparse matrix operations.
In related news, Nvidia’s Mark Harris has posted 10 Ways CUDA 6.5 Improves Performance and Productivity.