In this video from the 2016 OpenFabrics Workshop, Zili Zheng from LBNL presents: UPC++. “UPC++ is a parallel programming extension for developing C++ applications with the partitioned global address space (PGAS) model. UPC++ has demonstrated excellent performance and scalability with applications and benchmarks such as global seismic tomography, Hartree-Fock, BoxLib AMR framework and more. In this talk, we will give an overview of UPC++ and discuss the opportunities and challenges of leveraging modern network features.”
Today Nvidia announced that Brookhaven National Laboratory has been named a 2016 GPU Research Center. “The center will enable Brookhaven Lab to collaborate with Nvidia on the development of widely deployed codes that will benefit from more effective GPU use, and in the delivery of on-site GPU training to increase staff and guest researchers’ proficiency,” said Kerstin Kleese van Dam, director of CSI and chair of the Lab’s Center for Data-Driven Discovery.
In this special guest feature from Scientific Computing World, Robert Roe writes that software scalability and portability may be more important even than energy efficiency to the future of HPC. “As the HPC market searches for the optimal strategy to reach exascale, it is clear that the major roadblock to improving the performance of applications will be the scalability of software, rather than the hardware configuration – or even the energy costs associated with running the system.”
Today ArrayFire announced the release of Version 3.0 of their high-speed software library for GPU computing. The new version features major changes to ArrayFire’s visualization library, a new CPU backend, and dense linear algebra for OpenCL devices. It also includes improvements across the board for ArrayFire’s OpenCL backend.
“The use of GPUs to accelerate applications is mainstream nowadays, but their adoption in cur- rent clusters presents several drawbacks. In this talk we present the last developments of the rCUDA remote GPU virtualization framework, which is the only one supporting the most recent CUDA version, in addition to leverage the InfiniBand fabric for the sake of performance.”
Today Clemson University Monday announced that it has been named a CUDA Teaching Center.
This week Nvidia salutes Women who use CUDA for incredible science and engineering. They’ve compiled 30 profiles so far, and the advice they share from their experiences is quite inspiring. “It’s a good way to remind people that women write code, participate in open-source projects, and invent things,” said Lorena Barba from George Washington University. “It’s important to make the technology world more attractive to female students and show them examples of women who are innovators.”
Today Allinea Software today announced that the company’s Allinea DDT 4.2.1 debugging software has been tailored to offer full support for NVIDIA CUDA 6
“Discover killer-app fundamentals including how to tame dynamic parallelism with a robust-performance parallel stack that allows both host and device side fast memory allocation and transparent data transfer of arbitrarily complex data structures and general C++ classes. A low-wait approach (related to wait-free methods)is used to create a performance robust parallel counter. You definitely want to use this counter for histograms! New results extending machine learning and big data analysis to 13 PF/s average sustained performance using 16,384 GPUs in the ORNL Titan supercomputer will be presented.”
“Bulk leverages Hyper-Q and CUDA streams to run concurrent tasks on the GPU. It lets the programmer describe a parallel task (e.g. sort, for_each, reduction, etcetera) as a hierarchical grouping of execution agents.”