A Big Step Toward Exascale Application Programmability and Performance

On May 15, 2012, NVIDIA announced several innovative technologies aimed at improving the performance and energy efficiency of the Kepler GPUs, along with opening up a new class of applications and algorithms – creating a larger ecosystem of developers and applications for the GPU landscape.

Two of these innovations in particular caught our attention.

The first is Hyper-Q—a technology that enables multiple CPU cores to simultaneously use the CUDA architecture cores on a single Kepler GPU. The idea here is to dramatically increases GPU utilization while slashing CPU idle times and advancing programmability. Hyper-Q is enabled in CUDA 5, and we can expect to see it used heavily with standard MPI codes. Hyper-Q offers great promise for dealing with concurrency issues as we scale to much larger systems.

Community Response Section – Exascale Progress Meter

It was great to meet and shake hands with so many of our readers at the Nvidia GPU Technology Conference (GTC) in San Jose. For this month’s Community Feedback section, we are presenting the results of our very informal surveys conducted both online and onsite at GTC.

The Exascale Report asked:

What is the most promising development or research agenda you are aware of that represents potential positive impact for HPC advancement and progress toward achieving exascale?

Toward Power-efficient Exascale Systems

From an audio podcast produced for “Research HPC” – a series of programs featuring the voices of Intel Labs.

May, 2012
Toward Power-efficient Exascale Systems
In this article and audio podcast, we discuss Near Threshold Voltage (NTV) processing with Vivek De, an Intel Fellow and Director of Circuit Technology Research at Intel Labs.

When Science Isn’t Enough

NCSA’s Merle Giles is one of those speakers who knows how to hold the attention of his audience. Merle is also an HPC community evangelist who has no problem in telling it “like it is” when it comes to discussions of exascale.

He got our attention at the HPCC Conference in Newport, Rhode Island when he gave a compelling talk on the need for an exascale business model.

While we are starting to see signs of technological progress, we are still not having the right discussions around what it will take in order for exascale to make sense on the business front. As Merle points out, it’s not about ROI. That’s a completely different beast.

Through Dark Clouds of Confusion Come Glimmering Rays of Hope

Over the past year, barrels of ink have been used to describe all the things being doing wrong in the quest to reach exascale-levels of computation. Disagreement on what direction to take, the evolution vs. revolution argument, undoubtedly has impacted exascale progress – but thankfully has not stifled it completely. Today, we finally have a number of reasons to actually feel encouraged about exascale progress as the dark clouds of confusion and skepticism give way to glimmering rays of technological hope.

The first area of promise, a foundational step toward achieving exascale, is being researched and discussed by both Intel and Nvidia, but our feature article on this topic comes from Intel. We interviewed Intel Fellow, Vivek De, to discuss Near Threshold Voltage processing, or NTV. NTV processing is a research area that holds tremendous promise for more efficient power management, and is applicable to numerous future computing applications ranging from mobile applications to HPC, and is likely to be one of the critical technologies required to enable power-efficient exascale systems. This power management approach, which has been demonstrated with functioning research prototypes, including a solar-powered processor, still has a long way to go. But, in our opinion, NTV is indeed a glimmering ray of hope.

The other ray of sunshine comes from Nvidia with some recent announcements made at their GPU Technology Conference. We were impressed with two innovative components of the Keplar GPU called Hyper-Q and Dynamic Parallelism. We tend to refer to these two as one, as they work hand-in-hand to improve the Keplar GPU’s performance by a factor of 3 to 4x per watt from the previous generation of Fermi GPUs.

Really Fast or Really Productive? Pick One.

The Case for the Graph 500

The Graph 500 is not new, but interest has recently been piqued as discussions are shifting to application development plans for exascale-class systems. To better prepare for unprecedented sizes of data sets, many in the community believe we need to change our views on performance measurement from FLOPS to TEPS (Traversed Edges per Second.)

The Graph 500 has an impressive steering committee of more than 50 international HPC experts from academia, industry and many of the national laboratories. According to the Graph 500 website, they are in the process of developing comprehensive benchmarks to address three application kernels: concurrent search, optimization (single source shortest path), and edge-oriented (maximal independent set), while attempting to focus on five graph-related business areas: Cybersecurity, Medical Informatics, Data Enrichment, Social Networks, and Symbolic Networks.

Keeping The Community Involved: SIGHPC

Last year at SC11 the Association for Computing Machinery (ACM) launched its latest Special Interest Group (SIG). The new group, SIGHPC, is aimed squarely at the high performance computing community. We talked with John West, one of the founding officers of SIGHPC and Director of the DoD High Performance Computing Modernization Program, to get his view on what the new Special Interest Group is all about, how it’s doing since launch, and how this new organization will contribute to our community.

The Exascale Report: First off, can you tell our readers a little about SIGHPC and how it was started?

John West: SIGHPC really grew out of our community’s experience at the annual SC conference. SC provides an incredibly rich experience each year for attendees from all walks of HPC – from administrators and architects to students and teachers. And while the conference does have some elements that extend beyond the week of the show, these are managed as “special cases.” As the scope and influence of the conference has grown, its leaders found that they were missing a mechanism to engage with the international HPC community year round. This conversation evolved over a few years at the conference, culminating with a town hall meeting at SC10 to gauge interest and commitment for starting a new society.

A Little Planning (ok, a Lot) Goes A Long Way

Back when the Titan supercomputer award was announced (October 2011), there was a slightly embarrassing and awkward period of time during which Oak Ridge had no solid plans for where the expanded Jaguar system would call home. The word on the street was that there was no room at the inn, and while the system development had funded, the budget for facility expansion at the Lab fell through the cracks.

Facilities for exascale-class systems will require innovative, fresh thinking and can’t be treated as an afterthought. With the number of cabinets, processors, racks, cables, and power connections required, the facility work will need to start years ahead of the first systems being available.

The folks at the Lawrence Berkeley National Laboratory (LBNL), demonstrating their usual keen understanding of strategic planning, not to mention the courage of their convictions, have announced plans for a new computing facility capable of eventually housing two exascale supercomputers.

The International Race to Exascale

India Enters the Race While Russia Continues its Silent and Steady March

Two of the most interesting competitors in the exascale race are also the most quiet. The government of India has committed close to $ 1 billion (USD) toward an advanced supercomputing program with exascale written all over it. The actual details of the plan have not been disclosed, not even to many of the country’s top scientists. This is the largest research program ever funded in India, and yet it is being handled in deep secrecy.

Skeptics say there is a big difference between committing funds and actually spending the money, and claim the money may not even appear for several years.

Designing the Exascale Computer: Race Car or High-Speed Train?

In our last issue of The Exascale Report, we talked about the growing consensus that the global HPC community is, in fact, in a ‘race’ to exascale. And, as in most forms of racing, the stakes for coming in first are getting extremely high.

While the enthusiasm around a global race is not all bad, the world won’t be much better off than we are today if the crowning achievement is reaching exascale-levels of computation with a stunt machine that fails to serve the bigger purpose of timely scientific discovery.

Not only will it take massive funding to win the race, it will take innovation and deep collaboration, enveloped in years of preparation, to build an environment that supports critical applications on exascale-class machines.