Entries filed under “GPUs”

News related to the used of general purpose graphical processing units (GP-GPUs) in HPC gear.

Video: AMD’s CTO Talks Heterogeneous Systems Architecture

In this video, AMD’s Joe Macri describes the company’s HSA architecture (formerly known as Fusion). Recorded at the 2012 DesignCon conference in Santa Clara.

The architectural path for the future is clear,” Macri declared. That path will be paved with the programming patterns established on Symmetric Multi-Processor (SMP) systems migrating to the heterogeneous world. The architecture will be open, with published specifications and an open source execution software stack, and heterogeneous cores would be able to work together seamlessly in coherent memory, with low latency dispatch and no software fault lines.

A Tip of the Hat goes to Sylvie Barak at IEEE Times for pointing us to this video.

Also posted in Compute, HPC Hardware, Video | Leave a comment

Interview: Nvidia Updates Cuda Platform to 4.1

This week Nvidia announced the latest update to their Cuda platform for parallel computing. To learn more, I caught up with Will Ramey, Nvidia’s Sr. Product Manager for GPU Computing.

insideHPC: When we talk about a new Cuda platform, are we talking about the Cuda Toolkit plus SDK? Does this new update have a version number?

Will Ramey: Yes, this release is a new version of the CUDA Toolkit and SDK code samples, as well as updated drivers.  The version number for this release is 4.1

insideHPC: Specifically, what components comprise the platform?

Will Ramey: There are 3 key components to this release (version 4.1):

  1. The CUDA Toolkit is a comprehensive development environment for C and C++ developers building GPU-accelerated applications.  Version 4.1 of CUDA Toolkit includes a compiler for NVIDIA GPUs, math libraries, and tools for debugging and optimizing application performance.  You’ll also find programming guides, user manuals, API reference, and other documentation to help programmers add GPU acceleration to their applications quickly.  More info at: http://developer.nvidia.com/cuda-toolkit
  2. The CUDA Driver provides a system-level interface for CUDA applications to communicate with the GPUs, and is included in the NVIDIA drivers installer.
  3. NVIDIA also provides an SDK with over 100 GPU Computing SDK code samples, as well as white papers to help developers quickly add GPU acceleration to their applications.  More info at: http://developer.nvidia.com/gpu-computing-sdk

Developers need to install the CUDA Toolkit to build CUDA applications, and the latest NVIDIA drivers so their applications can communicate with the GPUs in their system.  Developers can also choose to install the SDK code samples to learn from the large collection of examples.

To run CUDA applications, end-users only need to install the latest NVIDIA drivers.

insideHPC: What is new within the updated platform?

Will Ramey: In addition to the new LLVM-based compiler that delivers up to 10 percent faster performance, there are a number of significant new features in this release:

  • New & Improved “drop-in” acceleration with GPU-Accelerated Libraries
    • Over 1000 new image processing functions in the NPP library
    • New cuSPARSE tri-diagonal solver up to 10x faster than MKL on a 6 core CPU
    • New support in cuRAND for MRG32k3a and Mersenne Twister (MTGP11213) RNG algorithms
    • Bessel functions now supported in the CUDA standard Math library
    • Up to 2x faster sparse matrix vector multiply using ELL hybrid format
  • Enhanced & Redesigned Developer Tools
    • Redesigned Visual Profiler with automated performance analysis and expert guidance system
    • CUDA_GDB support for multi-context debugging and assert() in device code
    • CUDA-MEMCHECK now detects out of bounds access for memory allocated in device code
    • Parallel Nsight 2.1 CUDA warp watch visualizes variables and expressions across an entire CUDA warp
    • Parallel Nsight 2.1 CUDA profiler now analyzes kernel memory activities, execution stalls and instruction throughput

  • Advanced Programming Features
    • Access to 3D surfaces and cube maps from device code
    • Enhanced no-copy pinning of system memory, cudaHostRegister() alignment and size restrictions removed
    • Peer-to-peer communication between processes
    • Support for resetting a GPU without rebooting the system in nvidia-smi
  • New & Improved SDK Code Samples
    • simpleP2P sample now supports peer-to-peer communication with any Fermi GPU
    • New grabcutNPP sample demonstrates interactive foreground extraction using iterated graph cuts
    • New samples showing how to implement the Horn-Schunck Method for optical flow, perform volume filtering, and read cube map texture

insideHPC: How do the new components ease code development?

Will Ramey: The new LLVM-based compiler compiles code faster than the old compiler, increasing developer productivity.  As you might expect, the compile-time saved varies by application, but we’ve seen some large applications compile more than 60 minutes faster than with the old compiler.

The NVIDIA Visual Profiler has been completely re-designed to streamline developers’ performance analysis workflow.  The new automated performance analysis feature quickly identifies bottlenecks and opportunities to improve application performance, and is integrated with the “Best Practices” documentation guiding developers through the process of optimizing their applications.  Developers can now achieve the full potential of GPU acceleration in their application with significantly less effort.

The new image & signal processing functions in NPP makes it easier for more developers to accelerate more of their algorithms on the GPU.

The new tri-diagonal solver in cuSPARSE allows developers to just call the pre-optimized version in the library instead of having to write their own.

insideHPC: How do the new components help speed developer code?

Will Ramey: The new LLVM-based compiler includes several new optimization techniques that allow the compiler to generate more efficient code.  This is another case where the performance improvement will vary depending on the application, but we’re seeing up to 10 percent performance improvement across a variety of applications.

Using the new RNGs in cuRAND, image & signal processing functions in NPP, tri-diagonal solver in cuSPARSE, etc. all help developers quickly take advantage of pre-optimized routines that take full advantage of hundreds of cores on the GPU.

insideHPC: If I had the most current version of Cuda yesterday, what’s new that I can download today?

Will Ramey: Today you can download the new CUDA Toolkit, SDK code samples, and drivers.  Available for Linux, MacOS and Windows.

 

Also posted in HPC, HPC Software, Tools | Leave a comment

Cyprus Unveils Largest Super in Eastern Mediterranean

This week the largest supercomputer in the Eastern Mediterranean was unveiled at the Cyprus Institute’s Computation-based Science and Technology Research Centre (CaSToRC). The IBM system is a hybrid CPU/GPU cluster currently holding 1,392 processors in 116 nodes.

The HPC facility “will enable cutting-edge research” the University of Cyprus’ rector Constantinos Christofides said, adding that times of economic crisis was exactly when research and innovation was necessary for growth. The Cy-Tera facility will be serving the research needs of the Cyprus Institute and a host of partners, including the University of Cyprus, Jordan’s Synchrotron-light for Experimental Science and Applications in the Middle East (SESAME), the University of Illinois’ national centre for supercomputing applications, the Julich Supercomputing Centre in Germany, among others.

Read the Full Story.

Also posted in HPC, HPC Hardware, New Installations | Leave a comment

Implementing Molecular Dynamics on Hybrid High Performance Computers

An author calling himself Morgoth Bauglir has posted a paper on using LAMMPS molecular dynamics software for distributed memory parallel hybrid machines.

The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. In this paper, we present a continuation of previous work implementing algorithms for using accelerators into the LAMMPS molecular dynamics software for distributed memory parallel hybrid machines.

Read the Full Story.

For the record, Morgoth is a fictional character who, as Sauron’s supreme commander, was the source of all evil in J. R. R. Tolkien’s Middle-earth.

Also posted in HPC, HPC Hardware | 2 Comments

Featured Sessions from GPU Technology Conference in Beijing

 

Nvidia has posted videos and slides from the recent GPU Technology Conference in Beijing. You can see the full listing here, but we’ve gone ahead an highlighted some of the featured talks in the the HPC track.

Weather & Climate Modeling

Digital Manufacturing

GPUs in HPC

Also posted in Digital Manufacturing, Events, GTC - GPU Technology Conference, HPC, HPC Hardware, Video | Leave a comment

Video: GPU Technology Conference Keynote

In this video, Nvidia CEO Jen-Hsun Huang keynotes the GPU Technology Conference in Beijing. Recorded Dec. 14, 2011. Download the Slides (PDF).

Also posted in HPC, HPC Hardware, Video | Leave a comment

Moscow State Taps T-Platforms to Build 10 Petaflops Super

By Timothy Prickett MorganGet more from this author

In what rings as almost an echo of Cold War-era scientific competition, Moscow State University is putting together a supercomputer it hopes will take it back up the international rankings.

Now, MSU has tapped its favorite contractor, T-Platforms, to build a hybrid CPU-GPU machine that will weigh in at 10 petaflops of peak performance and would vault it back towards the top of the HPC hit parade. T-Platforms has built several generations of rack and blade setups for MSU in the past couple of years.

MSU’s current machine, nicknamed “Lomonosov” after the 18th century Russian polymath, is also a ceepie-geepie machine that augments the number-crunching oomph of Xeon x86 processors from Intel with fanless Tesla X2070 GPU coprocessors from Nvidia.

T-Platforms’ T-Blade 2 chassis and blades are among the most cleverly engineered boxes on the market, being able to cram 16 server nodes, each with two Xeon processors and two Tesla coprocessors, into a 7U chassis and not actually melt. (See this story for the full details of the current Lomonosov machine.)

Lomonosov uses quad data rate (QDR) InfiniBand to interconnect the nodes, and the GPUs are lashed to the CPUs (one per socket) through the PCI-Express 2.0 bus in the Intel chipset. It has a peak theoretical performance of 1.37 petaflops, with 510 teraflops coming from a chunk of machines based only on x86 processors – specifically four-core Xeon E5570s and six-core Xeon X5670s.

There are a total of 43,520 cores on this part of the box, which is based on an early T-Blade blade server. This initial Lomonosov machine was augmented with 777 ceepie-geepie T-Blade 2 blade servers, which have a total 6,216 Xeon cores and 1,554 GPUs with a total of 795,648 cores. The GPUs deliver the vast majority of the additional 863 teraflops coming from the hybrid CPU-GPU blades.

While T-Platforms and Moscow State are not being terribly specific about the configuration of Lomonsov’s successor machine, rather than upgrading the existing machine, Moscow State this time around is asking T-Platforms to build a new 10 petaflops cluster based on a dense-pack rack server design, one that Alexey Komkov, vice president of products and technology at T-Platforms, says will include a custom rack design.

The machine will probably look like the rackish-bladish tray servers sold by Hewlett-Packard and Dell to hyperscale data center and HPC customers these days. The custom racks will include warm water cooling on the server nodes, according to Komkov.

T-Platforms has pitched a mix of compute nodes to Moscow State to come up with a 10 petaflopper. One node type will use a mix of either “Sandy Bridge” or “Ivy Bridge” Xeon processors from Intel, most likely two-socket nodes.

The second type of node in the machine will sport Sandy Bridge Xeons (again, very likely the Xeon E5s, due in early 2012) plus Nvidia’s impending “Kepler” next-generation GPU coprocessors (also due in 2012 and also running late like the Xeon E5s). The third node type will mix Sandy Bridge processors and Intel Many Integrate Core (MIC) coprocessors if they are available in 2012 for inclusion in the machines. ®

This article originally appeared in The Register. It appears here in its entirety as part of a cross-publishing agreement.

Also posted in Compute, HPC, HPC Hardware | Leave a comment

2012 Server Roadmap Forks at Power and Performance

Michael J. Miller writes that the 2012 server market will undergo more fragmentation with a renewed emphasis on HPC and the emergence of microservers thad draw less power.

In 2012, we’re likely to see even more of these systems focus on GPU computing. Nvidia continues to push its Tesla processors into more systems and says 35 of the top 500 computers in the world already use its processors, typically in a combination of traditional processors and GPU cores. Next year, Nvidia plans an upgrade to its basic architecture, moving from its 40nm Fermi design to a 28nm design known as Kepler. It is almost certain this will be part of future Tesla products, as well. I expect we’ll know more at the company’s GPU computing show next spring. In the meantime, the company has open-sourced its CUDA development platform for GPU computing.

Read the Full Story.

Also posted in Compute, HPC, HPC Hardware | Leave a comment

HokieSpeed Super to become a War Horse for Researchers

Steven Mackay writes that Virginia Tech’s new “HokieSpeed” supercomputer will be a veritable “War Horse” for researchers working on diverse science.

You may remember how Virginia Tech crashed the supercomputing arena in 2003 with System X, a novel Apple server cluster powered by the company’s G5 processors. Ranked at number 96 on the TOP500 and number 11 on the Green500, the new HokieSpeed supercomputer is 22 times faster and yet a quarter of the size of X, with a double-precision peak of 240 teraflops.

HokieSpeed is a versatile heterogeneous supercomputing instrument, where each compute node consists of energy-efficient central-processing units and high-end graphics-processing units,” said Wu Feng, associate professor with the Virginia Tech College of Engineering’s computer science and electrical and computer engineering departments. “This instrument will empower faculty members, students, and staff across disciplines to tackle problems previously viewed as intractable or that required heroic efforts and significant domain-specific expertise to solve.”

Each HokieSpeed node contains two 2.40-gigahertz Intel Xeon E5645 6-core central processing units, and two NVIDIA M2050/C2050 448-core GPUs, which reside on a Supermicro 2026GT0TRF motherboard.

HokieSpeed is now in the final stages of acceptance testing. Read the Full Story.

Also posted in Compute, HPC, HPC Hardware, New Installations | Leave a comment

Video: HPC Molecular Simulations using LAMMPS

In this video, Paul Crozier from Sandia National Laboratory presents: HPC molecular simulations using LAMMPS.

Recorded at the HPC Advisory Council Stanford Workshop on Dec. 7, 2011. Download the Slides (PDF).

Also posted in Events, HPC Advisory Council Workshop, HPC Hardware, HPC Software, Video | Leave a comment

Contest Winner – Cuda on ARM Developer Kit to be Named CARMA

Thanks to its crowdsourcing contest, Nvidia has named its ARM developer kit CARMA.

Powered by a Tegra 3 quad-core ARM-based processor and an NVIDIA CUDA-enabled GPU, the CARMA DevKit is being developed to support energy-efficient HPC projects using ARM-based GPU computing. In fact, this technology will power the Barcelona Supercomputing Center’s ARM-based GPU supercomputer.

CARMA is expected to start shipping in Q2 2012. Read the Full Story.

Also posted in HPC, HPC Hardware | Leave a comment

Video: GPU Computing – Past, Present & Future

In this video, Nvidia’s Ian Buck presents: GPU Computing: Past, Present & Future.

Learn how the GPU evolved from its humble beginning as a “VGA Accelerator” to become a massively parallel general purpose accelerator for heterogeneous computing systems. This talk will focus on significant milestones in GPU hardware architecture and software programming models, covering several key concepts that demonstrate why advances in GPU parallel processing performance and power efficiency will continue to outpace CPUs.

Recorded at the GPU Technology Conference on Dec. 15, 2011.

Also posted in Events, GTC - GPU Technology Conference, HPC, HPC Hardware, Video | Leave a comment

New GPU Cluster at Bielefeld University to Crack Quantum Chromodynamics

Bielefeld University announced this week that it will install a new GPU hybrid supercomputer in 2012. Used exclusively for Quantum Chromodynamics, the hybrid system will be equipped 400 Tesla GPUs 400 GPUs with a cumulative peak performance of about 500 Teraflops.

We are delighted about the opportunity to prove our expertise in high-end GPU computing with the new high-performance cluster in theoretical physics at the Bielefeld University,” said Gabriele Nikisch, the managing director of Bremen based sysGen GmbH, a certified NVIDIA Tesla Preferred Partner who develops and produces complete solutions for HPC and GPU Computing as well as server and storage systems.

Also posted in HPC, HPC Hardware | Leave a comment

Video: (Astro)-Physical GPU Supercomputing in China and Elsewhere – Galaxies, Black Holes, Gravitational Waves

In this video, Rainer Spurzem, a visiting professor of Chinese Academy of Sciences in Beijing presents: (Astro)-Physical GPU Supercomputing in China and elsewhere - galaxies, black holes, gravitational waves.

New powerful supercomputers have been built using graphical processing units (GPU) for general purpose computing. China has obtained top ranks in the list of the fastest supercomputers in the world with such systems. The research of Chinese Academy of Sciences and National Astronomical Observatory in Beijing with such GPU clusters will be reviewed, present and future applications in computer simulation and data processing discussed. We present particle- and mesh-based algorithms for astrophysics using hundreds to thousands of GPUs for one single application run in a parallel message passing environment, some with detailed timing models. Future perspectives for GPU and FPGA accelerated computing will be discussed and international collaboration in the ICCS (International Center for Computational Science). GPU and other ‘green’ supercomputing hardware is a stepping stone on the path to reach Exascale supercomputing. An application to astrophysical Computer Simulations of Dense Star Clusters in Galactic Nuclei with Supermassive Black Holes is presented. We use large high-accuracy direct N-body simulations with Hermite scheme and block-time steps, parallelised across a large number of nodes on the large scale and across many GPU thread processors on each node on the small scale. We reach a sustained performance of more than 350 Tflop/s for a science run on 1600 Fermi C2050 GPUs; a performance model is presented and studies for the largest GPU clusters in China with up to Petaflop/s performance and 7000 Fermi GPU cards. Our simulation proceeds to the complete relativistic merger of the black holes, including Post-Newtonian corrections to gravitational forces and the relevance of the results for the cosmological background of gravitational radiation is briefly touched. We discuss the relevance of this for pulsar timing bands and for frequency bands of new space based gravitational wave missions in China and Europe.

Recorded at the GTC Asia Conference on Dec. 15, 2011.

Also posted in Computing Research, Events, GTC - GPU Technology Conference, HPC, HPC Hardware, Video | Leave a comment

Video: Petaflop Biofluidics Simulations on the TSUBAME 2.0 Supercomputer

In this video, Simone Melchionna, a researcher at the National Research Council’s Institute for Physico-Chemical Processes, presents: Petaflop Biofluidics Simulations on the TSUBAME 2.0 Supercomputer.

We present a computational framework for multi-scale simulations of real-life biofluidic problems and applied to the simulation of blood flow through the human coronary arteries with a spatial resolution comparable with the size of red blood cells, and physiological levels of hematocrit. The simulation on Tsubame 2.0 exhibits excellent scalability up to 4000 GPUs and achieves close to 1 Petaflop aggregate performance, which demonstrates the capability to predicting the evolution of biofluidic phenomena of clinical significance. The combination of novel mathematical models, computational algorithms, hardware technology and optimization will be discussed together with an application employed to assess the vulnerability of the coronary network to atherosclerotic plaque build-up to assist clinical decision.

Recorded at the GPU Technology Conference in Beijing on Dec. 14, 2011.

Also posted in Computing Research, Events, Exascale, GTC - GPU Technology Conference, HPC Hardware, Video | Leave a comment

Advertisement

GTC Conference

View All Videos

insideHPC.com is a production of insideHPC, LLC. © 2006-2011 Sitemap