In this special guest feature from Scientific Computing World, Robert Roe writes that software scalability and portability may be more important even than energy efficiency to the future of HPC. “As the HPC market searches for the optimal strategy to reach exascale, it is clear that the major roadblock to improving the performance of applications will be the scalability of software, rather than the hardware configuration – or even the energy costs associated with running the system.”
Today ArrayFire announced the release of Version 3.0 of their high-speed software library for GPU computing. The new version features major changes to ArrayFire’s visualization library, a new CPU backend, and dense linear algebra for OpenCL devices. It also includes improvements across the board for ArrayFire’s OpenCL backend.
“The use of GPUs to accelerate applications is mainstream nowadays, but their adoption in cur- rent clusters presents several drawbacks. In this talk we present the last developments of the rCUDA remote GPU virtualization framework, which is the only one supporting the most recent CUDA version, in addition to leverage the InfiniBand fabric for the sake of performance.”
Today Clemson University Monday announced that it has been named a CUDA Teaching Center.
This week Nvidia salutes Women who use CUDA for incredible science and engineering. They’ve compiled 30 profiles so far, and the advice they share from their experiences is quite inspiring. “It’s a good way to remind people that women write code, participate in open-source projects, and invent things,” said Lorena Barba from George Washington University. “It’s important to make the technology world more attractive to female students and show them examples of women who are innovators.”
Today Allinea Software today announced that the company’s Allinea DDT 4.2.1 debugging software has been tailored to offer full support for NVIDIA CUDA 6
“Discover killer-app fundamentals including how to tame dynamic parallelism with a robust-performance parallel stack that allows both host and device side fast memory allocation and transparent data transfer of arbitrarily complex data structures and general C++ classes. A low-wait approach (related to wait-free methods)is used to create a performance robust parallel counter. You definitely want to use this counter for histograms! New results extending machine learning and big data analysis to 13 PF/s average sustained performance using 16,384 GPUs in the ORNL Titan supercomputer will be presented.”
“Bulk leverages Hyper-Q and CUDA streams to run concurrent tasks on the GPU. It lets the programmer describe a parallel task (e.g. sort, for_each, reduction, etcetera) as a hierarchical grouping of execution agents.”
Mark Harris from Nvidia presents this talk from SC13. “The performance and efficiency of CUDA, combined with a thriving ecosystem of programming languages, libraries, tools, training, and services, have helped make GPU computing a leading HPC technology. Learn how powerful new features in CUDA 6 make GPU computing easier than ever, helping you accelerate more of your application with much less code.”
Today Nvidia announced CUDA 6, the latest version of the company’s parallel computing platform designed to make parallel programming easier than ever.