Urgently Needed: Global Collaboration in HPC

Print Friendly, PDF & Email

The view from Andy Keane, General Manager, Tesla High Performance Computing Products at NVIDIA.

The world of high performance computing is about to change dramatically. This change isn’t so much about building bigger and better petaFLOPS and eventually exaFLOPS machines. It has more to do with the decisions we make as members of a global community entrusted with a technology that can transform how we live and work, and our relationship with the planet.

It’s not that building more powerful supercomputers is unimportant. We all look forward to the semi-annual TOP500 listings accompanied by the inevitable chest thumping and justifiable self-congratulations on the part the top ranking manufacturer, user, and country of origin.

Since the list was created in 1993, the battle for the top slot has been a seesaw contest between Japanese and U.S. companies like Fujitsu, Hitachi, IBM, and Cray. When the TOP500 ratings were announced at the ISC’10 conference last May, Jaguar, located at DOE’s Oak Ridge Leadership Computing Facility, was ranked number one.

However, as famed baseball pitcher Satchel Paige once observed, “Don’t look back. Something might be gaining on you.” When Jaguar glanced over its shoulder, it saw a Chinese juggernaut hard on its heels. And then, it happened. On Oct. 27, the NY Times broke the story that the number one spot had been captured by a Chinese supercomputer called Tianhe-1A. As of this writing the official numbers had not been published, but according to the article and other sources, Tianhe has 1.4 times the horsepower of Jaguar with a LINPACK benchmark of 2.507 petaFLOPSs/s and a peak performance of 4.7 petaFLOPSs. It also uses about half the power of Jaguar.

So, almost overnight, China has become a major player in the HPC arena with a number of other countries not far behind. Most are using hybrid parallel processors to catapult their supercomputers into the top echelons of the list. The latest official TOP500 rankings, to be published just in time for SC10 in New Orleans, will confirm that the international supercomputer playing field is undergoing a major metamorphosis.

This shift in the balance of supercomputer power has major implications for the U.S. This country’s reliance on traditional CPUs — the technology that once put us out in front of the rest of the world — is now holding us back. Our competitors are using American-developed parallel processing NVIDIA GPUs and multi-core CPUs from Intel or AMD to build their new HPC systems. In the near future there will be more petaFLOPS class computers outside the U.S. than there are internally.

In addition, the U.S. has other HPC-related problems to contend with. For example, at current levels of demand, government research-oriented supercomputers are already 2X oversubscribed. And that’s just a tiny foretaste of what’s to come — by decade’s end, our scientific endeavors will demand a thousand-fold increase in computational capacity.

Now the U.S. is not just rolling over and letting the rest of the world catch up. We have lost none of our fierce competitiveness, and the race to build the fastest, most powerful, cost effective, and energy efficient HPC systems is a prize well worth winning. Just one example: DARPA with its Ubiquitous High Performance Computing (UHPC) project is funding efforts to build the technologies necessary to create exascale computers that will be more than 1,000 times more powerful than today’s systems.

Competition or Collaboration?

But as alluring as the rough and tumble of the international HPC marketplace can be, this emphasis on competition can have it’s downside. NDAs flourish, proprietary information is jealously guarded, and intellectual property placed under tight lock and key.

Given this seemingly inevitable trend, perhaps we’re missing the point. Maybe it’s not global competition that should be first and foremost. What may be needed most of all is a new level of global collaboration on a scale never before envisioned.

Think about the tasks that supercomputers are superbly engineered to address — the grand challenges that represent some of the planet’s most pressing problems. For example: predicting weather, climate changes and global warming; lowering carbon footprints; finding cures for cancer, Alzheimer’s, and other diseases; creating new, clean sources of energy; researching new materials; and understanding the fundamental nature of matter. The list goes on. As NVIDIA CEO Jen-Hsun Huang said at the company’s September GPU Technology conference, “I really don’t care who cures cancer. I really don’ t care who cures Alzheimer’s. It doesn’t matter what country does it. Just please do it!”

HPC systems operating at the petascale — and ultimately exascale — levels have the raw horsepower researchers require to make significant inroads into these and other global problems. But given the complexity and ubiquitous nature of these challenges, collaboration, not competition, is required across the board.

Some of this is already happening on a regional basis in Europe with the European Exascale Software Initiative (EESI) and Partnership for Advanced Computing in Europe (PRACE). (See the July issue of The Exascale Report, Exascale Development Flows to Europe.) Broader in scope but more narrowly focused, the International Exascale Software Project (IESP) is developing an international plan for the development of an open source software stack necessary for exascale.

We are advocating cooperation and collaboration on a much broader scale.

Because the urgent challenges facing humanity are global in scope, we need to rethink our competitive instincts. Instead of pitting geographic regions, countries or companies against each other in a competitive race for prestige, power and revenues, we need to funnel that energy into a worldwide competition in the creation of knowledge.

Imagine a global network of interconnected high performance computers dedicated to solving the world’s greatest challenges through international sharing and collaboration. This approach would create computational nodes around the globe where clusters of the smartest and most innovative scientists, researchers, engineers, teachers and students available will congregate to tackle these problems.

Sure, the teams populating these systems will compete. Competition is healthy — especially a “collaborative competition” dedicated to the advancement of knowledge and finding answers to the biggest problems we have ever confronted.

In the long run, everyone benefits.

NVIDIA's Andy Keane

Andy Keane