Sign up for our newsletter and get the latest HPC news and analysis.

Search Results for: cuda

Video: CUDA 6 and Beyond

Mark Harris

In this video, Nvidia’s Mark Harris, provides a detailed look at the top new features of CUDA 6, including a deep-dive review of Unified Memory, which makes GPU programming easier by automatically migrating data between the CPU and GPU.

Managing the GPUs of Your Cluster in a Flexible Way with rCUDA


In this talk, we introduce the rCUDA remote GPU virtualization framework, which has been shown to be the only one that supports the most recent CUDA versions, in addition to leverage the InfiniBand interconnect for the sake of performance. Furthermore, we also present the last developments within this framework, related with the use of low-power processors, enhanced job schedulers, and virtual machine environments.”

How New Features in CUDA 6 Make GPU Acceleration Easier


Mark Harris from Nvidia presents this talk from SC13. “The performance and efficiency of CUDA, combined with a thriving ecosystem of programming languages, libraries, tools, training, and services, have helped make GPU computing a leading HPC technology. Learn how powerful new features in CUDA 6 make GPU computing easier than ever, helping you accelerate more of your application with much less code.”

CUDA 6 Release Simplifies Parallel Programming With Unified Memory, Drop-In Libraries

Today Nvidia announced CUDA 6, the latest version of the company’s parallel computing platform designed to make parallel programming easier than ever.

Allinea DDT Announces Support for NVIDIA CUDA 5.5 and CUDA on ARM

Today Allinea Software announced support for version 5.5 of the NVIDIA CUDA parallel programming toolkit. The new release includes debugging support for C++11, GNU 4.8 compilers, and ARMv7 architectures, which will soon power hybrid platforms with lower energy consumption for HPC.

rCUDA – Leveraging Low-Power Processors & InfiniBand Interconnects


In this talk, we introduce the rCUDA remote GPU virtualization framework, which leverages the InfiniBand interconnect for performance.

Rob Farber Tutorial on Atomic Operations and Low-Wait Algorithms in CUDA

When used correctly, atomic operations can help implement a wide range of generic data structures and algorithms in the massively threaded GPU programming environment.

Video: Cuda Development on ARM Platforms

In this video, Nvidia’s Mark Ebersole demonstrates Cuda development on an ARM-based platform. With CUDA 5.5, it is now possible to compile and run CUDA applications on ARM-based systems such as the Kayla development platform. In addition to native compilation on an ARM-based CPU system, it is also possible to cross-compile for ARM systems, allowing […]

CUDA 5.5 Goes GA with Support for ARM Platforms

Over at Tom’s Hardware, Kevin Parrish writes that the newly released CUDA 5.5 programming model now supports ARM platforms. It also features a number of advanced performance and productivity features including enhanced Hyper-Q support, MPI workload prioritization, guided performance analysis, and fast cross-compile on x86. Since developers started using CUDA in 2006, successive generations of […]

Prototype Algorithms and Test CUDA Kernels in MATLAB

Over at the Nvidia Developer Zone, Daniel Armyr and Dan Doherty from MathWorks describe how you can use MATLAB to support your development of CUDA C and C++ kernels. While algorithms written for the GPU are often much faster, the process of building a framework for developing and testing them can be time-consuming. Many programmers […]