“Discover killer-app fundamentals including how to tame dynamic parallelism with a robust-performance parallel stack that allows both host and device side fast memory allocation and transparent data transfer of arbitrarily complex data structures and general C++ classes. A low-wait approach (related to wait-free methods)is used to create a performance robust parallel counter. You definitely want to use this counter for histograms! New results extending machine learning and big data analysis to 13 PF/s average sustained performance using 16,384 GPUs in the ORNL Titan supercomputer will be presented.”
“During the previous GTC, Murex has shown how the company had adapted their generic Monte-Carlo & PDE codes compatible with a payoff language. With one more year of experience with GPUs and OpenCL Murex will show how the company has broadened the usage of GPUs for other subjects like vanilla screening or model calibration and focus on their new challenge: use as many GPUs as possible for one single computation.”
“How do you cross-correlate 10,000 signals 100 million times per second? This is an example of the type of compute-bound problem facing modern radio astronomy, which, paralleling the paradigm shift in computing architectures, has transitioned from monolithic single-dish telescopes to massive arrays of smaller antennas. In this session we will describe how general-purpose HPC installations can be used to achieve scaling of a cross-correlation pipeline to petascale with all the flexibility of a purely-software implementation.”
“For NVLink to have its highest value it must function properly with unified memory. That means that the Memory Management Units in the CPUs have to be aware of NVLink DMA operations and update appropriate VM structures. The operating system needs to know when memory pages have been altered via NVLink DMA – and this can’t be solely the responsibility of the drivers. Tool developers also need to know details so that MPI or other communications protocols can make use of the new interconnect.”
In this video from GTC 2014, Steve Oberlin from Nvidia describes his new role as Chief Technical Officer for Accelerated Computing. Along the way, he discusses: the HPC lessons learned from the CRAY T3E and other systems, Nvidia’s plans to tackle the challenges of the HPC Memory Wall, the current status on Project Denver, and how Nvidia plans to couple to the POWER architecture in future systems.