Podcast: CUDA Programming for GPUs

In this Programming Throwdown podcast, Mark Harris from Nvidia describes Cuda programming for GPUs. “CUDA is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU). With millions of CUDA-enabled GPUs sold to date, software developers, scientists and researchers are finding broad-ranging uses for GPU computing with CUDA.”

Video: UPC++ Parallel Programming Extension

In this video from the 2016 OpenFabrics Workshop, Zili Zheng from LBNL presents: UPC++. “UPC++ is a parallel programming extension for developing C++ applications with the partitioned global address space (PGAS) model. UPC++ has demonstrated excellent performance and scalability with applications and benchmarks such as global seismic tomography, Hartree-Fock, BoxLib AMR framework and more. In this talk, we will give an overview of UPC++ and discuss the opportunities and challenges of leveraging modern network features.”

Brookhaven Lab is the latest Nvidia GPU Research Center

Today Nvidia announced that Brookhaven National Laboratory has been named a 2016 GPU Research Center. “The center will enable Brookhaven Lab to collaborate with Nvidia on the development of widely deployed codes that will benefit from more effective GPU use, and in the delivery of on-site GPU training to increase staff and guest researchers’ proficiency,” said Kerstin Kleese van Dam, director of CSI and chair of the Lab’s Center for Data-Driven Discovery.

Who Will Write Next-generation Software?

In this special guest feature from Scientific Computing World, Robert Roe writes that software scalability and portability may be more important even than energy efficiency to the future of HPC. “As the HPC market searches for the optimal strategy to reach exascale, it is clear that the major roadblock to improving the performance of applications will be the scalability of software, rather than the hardware configuration – or even the energy costs associated with running the system.”

ArrayFire Updates GPU Software Library

Today ArrayFire announced the release of Version 3.0 of their high-speed software library for GPU computing. The new version features major changes to ArrayFire’s visualization library, a new CPU backend, and dense linear algebra for OpenCL devices. It also includes improvements across the board for ArrayFire’s OpenCL backend.

Video: Increasing Cluster Throughput while Reducing Energy Consumption for GPU Workloads

“The use of GPUs to accelerate applications is mainstream nowadays, but their adoption in cur- rent clusters presents several drawbacks. In this talk we present the last developments of the rCUDA remote GPU virtualization framework, which is the only one supporting the most recent CUDA version, in addition to leverage the InfiniBand fabric for the sake of performance.”

Clemson Becomes the Latest CUDA Teaching Center

Today Clemson University Monday announced that it has been named a CUDA Teaching Center.

Nvidia Salutes Women Who CUDA

This week Nvidia salutes Women who use CUDA for incredible science and engineering. They’ve compiled 30 profiles so far, and the advice they share from their experiences is quite inspiring. “It’s a good way to remind people that women write code, participate in open-source projects, and invent things,” said Lorena Barba from George Washington University. “It’s important to make the technology world more attractive to female students and show them examples of women who are innovators.”

Allinea DDT Debugger Adds Support for NVIDIA CUDA 6

Today Allinea Software today announced that the company’s Allinea DDT 4.2.1 debugging software has been tailored to offer full support for NVIDIA CUDA 6

Video: Rob Farber Teaches Killer App Fundamentals at GTC 2014

“Discover killer-app fundamentals including how to tame dynamic parallelism with a robust-performance parallel stack that allows both host and device side fast memory allocation and transparent data transfer of arbitrarily complex data structures and general C++ classes. A low-wait approach (related to wait-free methods)is used to create a performance robust parallel counter. You definitely want to use this counter for histograms! New results extending machine learning and big data analysis to 13 PF/s average sustained performance using 16,384 GPUs in the ORNL Titan supercomputer will be presented.”