“Discover killer-app fundamentals including how to tame dynamic parallelism with a robust-performance parallel stack that allows both host and device side fast memory allocation and transparent data transfer of arbitrarily complex data structures and general C++ classes. A low-wait approach (related to wait-free methods)is used to create a performance robust parallel counter. You definitely want to use this counter for histograms! New results extending machine learning and big data analysis to 13 PF/s average sustained performance using 16,384 GPUs in the ORNL Titan supercomputer will be presented.”
“Bulk leverages Hyper-Q and CUDA streams to run concurrent tasks on the GPU. It lets the programmer describe a parallel task (e.g. sort, for_each, reduction, etcetera) as a hierarchical grouping of execution agents.”
“It’s a good way to remind people that women write code, participate in open-source projects, and invent things,” Barba said. “It’s important to make the technology world more attractive to female students and show them examples of women who are innovators.”
“During the previous GTC, Murex has shown how the company had adapted their generic Monte-Carlo & PDE codes compatible with a payoff language. With one more year of experience with GPUs and OpenCL Murex will show how the company has broadened the usage of GPUs for other subjects like vanilla screening or model calibration and focus on their new challenge: use as many GPUs as possible for one single computation.”
“How do you cross-correlate 10,000 signals 100 million times per second? This is an example of the type of compute-bound problem facing modern radio astronomy, which, paralleling the paradigm shift in computing architectures, has transitioned from monolithic single-dish telescopes to massive arrays of smaller antennas. In this session we will describe how general-purpose HPC installations can be used to achieve scaling of a cross-correlation pipeline to petascale with all the flexibility of a purely-software implementation.”
“For NVLink to have its highest value it must function properly with unified memory. That means that the Memory Management Units in the CPUs have to be aware of NVLink DMA operations and update appropriate VM structures. The operating system needs to know when memory pages have been altered via NVLink DMA – and this can’t be solely the responsibility of the drivers. Tool developers also need to know details so that MPI or other communications protocols can make use of the new interconnect.”
I produced this time lapse video featuring clips from the opening keynote at GTC 2014. “Nvidia really knows how to wow the audience with a blend of showmanship, state-of-the-art graphics, and dazzling technology.”
In this video from GTC 2014, Steve Oberlin from Nvidia describes his new role as Chief Technical Officer for Accelerated Computing. Along the way, he discusses: the HPC lessons learned from the CRAY T3E and other systems, Nvidia’s plans to tackle the challenges of the HPC Memory Wall, the current status on Project Denver, and how Nvidia plans to couple to the POWER architecture in future systems.
With their upcoming Pascal generation of GPUs, Nvidia plans to bring NVLink and 3D stacked memory to market in their ongoing battle against the HPC memory wall. “This is our 50th episode of Radio Free HPC, so I’d like to offer a Tip of the Hat to my co-hosts, Henry Newman and Dan Olds!”
In this video, Dan Olds and Rich Brueckner from Radio Free HPC take a Tesla Motors electric sedan for a test drive. Dan has a lead foot, so hang on and come along for the ride!