Anyone familiar with DARPA knows the agency is not averse to taking risks. Tackling really tough technological problems by funding innovative research is fundamental to its mission statement.
But with the Ubiquitous High Performance Computing (UHPC) program, DARPA is really pushing the envelope. It’s calling for nothing less than a revolution that could drastically and permanently alter the nature of computing.
In the Broad Agency Announcement (DARPA’s equivalent of an RFP) issued in March of this year, the agency made it clear it is taking no prisoners. The BAA states, “Current evolutionary approaches to progress in computer designs are inadequate. To meet the relentlessly increasing demands for greater performance and higher energy efficiency, revolutionary new computer systems designs will be essential to support new generations of advanced DoD system capabilities and enable new classes of computer application…UHPC systems designs that merely pursue evolutionary development will not be considered.”
DARPA is looking for a 1,000X increase in computing capabilities by 2015.
Response from the HPC community has been predictable. Two opposing camps have sprung up — those who side with DARPA’s call for a totally new computing paradigm versus those who believe the road to exascale should be evolutionary — leveraging what is already in place, moving along the roadmap.
Let’s call it the revolutionaries vs. the incrementalists. And Bill Harrod, DARPA Program Manager for UHPC makes no bones about where he and the agency stand. “Evolution exascale designs will fail,” he said flatly in a presentation last year. As far as the BAA is concerned, incrementalists need not apply.
Huge Hurdles Ahead
UHPC is not targeting exascale per se. Notes Thomas Sterling, Professor of Computer Science at Louisiana State University, “ExaFLOPS performance is never explicitly stated in the definition of the program. However it is implicit in the requirement for a single rack capable of delivering in excess of one petaFLOPS on the Linpack benchmark for a power budget of 57 kilowatts. An added capability is the interoperability of an unspecified number of those single racks in order to address a single application.”
Sterling, who is best known as the father of the Beowulf clusters and for his research on petaFLOPS computing architecture, has been very involved in the research that led to the DARPA proposal. And he comes down squarely on the side of the revolutionaries. “The program,” he says, “is an extraordinary call to arms, an alert to the entire HPC community.”
Contract awards for the program are expected to be announced shortly.
Sterling and his colleagues — people like Jack Dongarra, co-founder and co-principal investigator for the International Exascale Software Project (IESP), Peter Kogge, McCourtney Chair in Computer Science and Engineering at the University of Notre Dame, and Vivek Sarkar, Professor, Dept. of Computer Science, Rice University, who led the Darpa Exascale Software Study — have their work cut out for them. Among the problems that the program will tackle include parallelism, reliability, user productivity, and most importantly, power and energy concerns.
DARPA points out that current processing systems are “grossly power-inefficient and typically deliver only a small fraction of peak performance.” Upping transistor counts and attempts to reduce operating voltages and the rate of permanent and transient faults, as well as variations between devices on the same die, have not helped to reduce energy costs. As long ago as 2007, the EPA reported to Congress that if current data center and server energy trends continued, we would be looking at more than 100 billion kWh, representing an annual electricity cost of $7.4 billion.
DARPA also acknowledges that Moore’s Law, our faithful predictor of processor performance advances for more than four decades, is finally sputtering to a halt. “Until recently,” the agency wrote in the UHPC BAA, “advances in Commercial Off-The-Shelf (COTS) systems performance were enabled by increases in clock speed, decreases in supply voltage, and growth in transistor count. These technology trends have reached a performance wall where increasing clock speed results in unacceptably large power increases, and decreasing voltage causes increasing susceptibility to transient and permanent errors.”
Sterling adds that after two decades of HPC constancy based on the Communicating Sequential Process Model (CSP) and microprocessor design enhancements — along with tweaks to existing methods such as clusters and MPI — a new approach is needed. He is referring to both future designs and their means of operation. The goal is to realize practical extreme capability for a broad range of system scale (hence “ubiquitous) and data intensive applications domains — e.g. future graph problems, which will be pervasive in real world applications and can be described as massive, dynamic, and incorporating high-dimensional data.
So DARPA has opened its purse strings to bring together some of the best minds in the HPC community. Their task is to develop radically new computer systems that exhibit such advanced characteristics as:
- Efficiency — Software and software co-design at the systems level to minimize energy dissipation and maximize energy efficiency with a 50 gigaFLOPS/watt goal without sacrificing the scalability needed to handle ultra-high performance DoD applications. That’s an average total energy of 20pJ per floating point operation. (See sidebar: “The Problem of Power.”)
- Programmability — New technologies and execution models that do not require applications programmers to get bogged down in system complexity — such as architectural attributes impacting data locality and concurrency. These solutions should help, not hinder programmers, in their drive to meet their performance and time-to-solution goals.
- Concurrency — Radically new technology that will be routinely handled by system-level programmers, which will manage hardware and software concurrency in order to minimize overhead for the >10 billion-way parallelism needed to hide latency and assure maximum throughput.
- Resiliency — A systems-wide approach to design that will achieve high levels of reliability and security by employing fault management techniques that will enable applications to execute correctly despite failures and attacks.
Advanced Applications
The payoff for these efforts is multifaceted. Being able to scale up today’s applications — think materials, chemistry, computational fluid dynamics, impact scenarios and phenomena — is a major benefit. But DARPA has even more ambitious goals in its crosshairs: applications that will require truly remarkable improvements to be of real value.
Here’s Sterling’s take on potential future apps: “DARPA is also looking at classes of applications that require truly remarkable improvements in performance to be of realistic value. At the small scale are streaming applications for military reconnaissance, command and control, synthetic aperture radar, passive sonar, and automatic target recognition to name a few using highly mobile, long energy lifetime autonomous computing platforms.
“DARPA is also looking towards strategic application of knowledge management and therefore the underlying graph problems they require,” he continues. “This includes breaking into the new frontiers of dynamic graph processing, a very different domain than conventional matrix computing. Machine intelligence may be a byproduct of such advances with symbols represented by vertices and relations captured by their interconnecting links. Not only may these include large ontologies for natural language processing or social networks for national security needs, they may also include rapidly changing graph problems for hypothesis testing under noisy conditions, decision making for rapid command and control, pattern matching for target recognition, theorem proving, and alternative game scenarios for strategic and tactical planning. An ultimate objective may be machine intelligence for truly autonomous vehicles in the air, on the ground, and under the sea. There is a sense that perhaps the most important applications may ultimately be those unimagined at this time, but that future innovation will inspire in ways we have as yet to consider.”
This slide from Harrod’s presentation graphically illustrates some of the applications and benefits that DoD expects to realize from UHPC:
The Risky Art of Revolution
One of the major prerequisites of the UPHC program is what the agency terms establishing an “open innovation environment.”
As defined by the program, open innovation assumes an openness on the part of organizations — including profit-driven, highly competitive commercial enterprises — that includes the adoption of both internal and external ideas and paths to market, all taking place in an environment where researchers can “openly exchange, debate and formulate revolutionary ideas concerning computer hardware, software and applications technology.” Collaboration is expected not only between diverse project teams, but also may include researchers not funded by the UHPC program.
Sterling, part of the UHPC TA2 team chartered with fostering cooperation among the many disparate entities and organizations involved in the project, recognizes the potential speed bumps. “There is no question that there is a tension between investment in vendor roadmaps and a resistance to alternative means,” he says. “Even recently I’ve been told by various senior technical leaders of different vendors that economics and mass market will drive component solutions rather than extreme scale computing. But this has always been historically so until something new and valuable comes along in spite of their roadmaps. Then they jump on board and make it part of their future product line…Open innovation is a gutsy strategy and the only one that is likely to achieve its goal. I consider this an enlightened program, rare in inception, accurate in its direction.”
Chris Willard, Chief Research Officer of the HPC-oriented research firm, InterSect360, adds “The reason industry will not be able to solve these problems on their own and DARPA has to take the initiative, is that the manufacturers see developing new supercomputing technologies — such as those outlined by the UHPC program — as too high a risk, or too low a return, or both. This is the difference between anyone interested in making “quantum leaps” in technology and those people who are looking to run a profitable business. The business people are deciding where to invest their money in the short term. Given this perspective, it makes more sense to invest in incremental improvements in what already exists.”
Sterling has an interesting take on the risks involved with adopting incrementalism vs. revolution. An incremental approach to a technological challenge as complex and demanding as UHPC has a major chance of falling short, he asserts. To be sure, if you could meet the DARPA specs incrementally, your risk is low. But if it can’t be done — if it is just about impossible to achieve extreme, ubiquitous computing verging on exaFLOPS in the DARPA time frame by using approaches that reflect conventional practices, then incrementalism is far riskier than jump starting a revolution.
“I think computer technology is at a breaking point, a point of punctured equilibrium that demands a new set of relationships, a new paradigm,” he says. “UHPC and the question of evolution vs. revolution are generating strong feelings in the community. Some feel that the vendors will achieve exascale goals by taking an incremental, evolutionary approach; DARPA is causing a distraction by asking the manufacturers to do something that is counterproductive to their roadmap. However, if this really is a major turning point, they are wrong.”
So, man the barricades and let the revolution begin.
The Problem of Power
How embarrassing. It’s 2015. You crank up your shiny new exascale system, run your memory interfaces and floating point unit at full performance — and the system melts.
Energy and power are the 800 pound gorilla in the exascale room — probably the biggest stumbling block in the road to UHPC and the promise of exascale that lies beyond.
Today’s processors are running at thousands of pJ/Op. To achieve the UHPC goal of 50 gigaFLOPS/watt they need to be operating at only tens of pJ per floating point operation. The DARPA-sponsored exascale technology study chaired by Dr. Peter Kogge, University of Notre Dame, stated unequivocally that conventional systems will not make UHPC power goals by 2015. Instead radical new designs are needed, particularly to cope with power hungry transport of data “from one site to another — on the same chip, between closely coupled chips in a common package, or between different racks on opposite sides of a large machine room, or on storing data in the aggregate memory hierarchy.”
There’s more to power consumption than avoiding turning your system into a heap of molten slag. As the IESP roadmap points out, every megawatt of reduced power consumption means a savings of at least $1M/year. Current state-of-the-art power management systems are based on advances in the consumer-electronic and laptop markets.
Says the agency, “Unfortunately, the technology to collect information across large-scale systems, make control decisions that coordinate power management decisions across the system, and reduced models of code performance for optimal control are not well developed. Furthermore, the interfaces for representing sensor data for the control system, interfaces to describe policies to the control system, and to distribute control decisions are not available at scale. Effective system-wide power management will require development of interface standards to enable both vertical (e.g. between local components and integrated systems) and horizontal integration (e.g. between numerical libraries) of components in a complete solution. Standardization is a minimum requirement for broad international collaboration on (the) development of software components. The research and development effort required to bring these technologies into existence will touch on nearly every element of a large-scale computing design — from library and algorithm design to system management software.”