UHPC Revisited: An Interview with DARPA Program Manager Bill Harrod

Print Friendly, PDF & Email

Sometimes revolutions start very quietly. DARPA’s Ubiquitous High Performance Computing (UHPC) program seems to fall into this category.

Last March the agency issued a Broad Agency Announcement (BAA) requesting proposals from the HPC community to help develop extremely high performance computer systems using novel design approaches.

The goal is to achieve by 2015 a 1,000-fold increase in capabilities over today’s most powerful supercomputers such as Jaguar at Oak Ridge National Labs — Jaguar has a theoretical peak capability of 2.3 petaFLOPS/s and nearly a quarter of a million computational cores.

(For a detailed look at what DARPA is hoping to achieve with the UHPC program, see the July issue of The Exascale Report, “DARPA and UHPC: Jump Starting a Revolution.”)

Haltingly, the names of the winners trickled out, accompanied by upbeat but mercifully brief and understandably vague pronouncements from various spokespeople representing the government, academic and commercial companies that will be working on the program.

According to DARPA, the agency has awarded contracts to Intel Corporation, Hillsboro, OR, NVIDIA Corporation, Santa Clara, CA, Massachusetts Institute of Technology, Boston, MA, and Sandia National Laboratory, Albuquerque, NM. They are leading development teams that will design and build prototype computers for the UHPC program. Georgia Institute of Technology, Atlanta, GA, was selected to lead the Applications, Benchmarks and Metrics (ABM) team for UHPC.

Also participating is LSU’s Center for Computation & Technology (CCT). CCT’s Thomas Sterling and his research group will focus on execution models, runtime system software, memory system architecture, and symbolic applications

An exaFLOPS machine per se is not the Holy Grail DARPA is seeking. As Sterling notes in the July issue of The Exascale Report, “ExaFLOPS performance is never explicitly stated in the definition of the program. However it is implicit in the requirement for a single rack capable of delivering in excess of one petaFLOPS on the Linpack benchmark for a power budget of 57 kilowatts. An added capability is the interoperability of an unspecified number of those single racks in order to address a single application.”

In order to obtain more information about the UHPC program directly from DARPA, we submitted a series of questions to Dr. William Harrod, the agency’s Program Manager for UHPC.

Briefly describe the UHPC initiative. What are its key goals?

Bill Harrod: The UHPC System will deliver 1 peta-ops, in a single cabinet, consuming less than 57KW, including system cooling, with an energy efficiency of 50 GFLOPS per watt.

For comparison, the highest energy efficiency supercomputer on the latest Green500 list performs at 0.722 GFLOPS per watt. The UHPC program will develop new, secure, scalable, energy efficient system architectures that will not require application programmers to explicitly manage system complexity and extreme parallelism to develop computationally efficient applications.

The UHPC program system designs will be scalable from embedded terascale systems up through at least single cabinet petascale configurations and provide the building blocks and framework for future exascale computers. To help achieve these goals, the DARPA UHPC Program will implement an open collaborative research environment.

The DARPA-sponsored Exascale Working Group, whose objective was to identify the major challenges for achieving a 1,000-times increase in computational capabilities of computing systems by 2015, reported that excess overhead, data movement and storage wastes a significant percentage of the overall energy used in today’s HPC architectures. Therefore, the key principle behind the UHPC design efforts is to aggressively minimize the energy for all types of data operation including data movement and overhead. Minimizing data movement can only be accomplished by using new execution models. Minimizing the overhead would also have a significant impact on the effective programmability of the computer.

What are the major problems with current and future HPC systems that motivated DARPA to issue the BAA? For example, problems associated with power and energy, cyber resiliency, parallelism and user productivity.

Harrod: The architectural advances that are required to build ExtremeScale computers have many significant hurdles to overcome and were investigated and identified in the DARPA ExaScale (ExtremeScale) Study. To achieve the goal of building these ExtremeScale computer systems the challenges below must be concurrently addressed:

  • The Energy and Power Challenge is the most pervasive challenge. A key observation is that it will be easier to solve the power problem associated with computation than to reduce the problem associated with transporting data.
  • The Memory and Storage Challenge concerns the lack of components that will support a variety of DoD application suites at the desired computational rate and still fit within an acceptable power envelope.
    The Concurrency and Locality Challenge grows out of the leveling off of silicon clock rates resulting in the end of increasing single thread performance, leaving parallelism as the only mechanism to increase overall system performance.
  • The Resiliency (Dependability) Challenge grew out of the required explosive growth in component count and that components will be operated at lower voltage levels, where individual devices and circuits become more and more sensitive to local operating environments and susceptible to potential error or system failure.

These challenges cannot be pursued independently or at the component level. They must be addressed as an integrated software and hardware co-design solution.

What are some of the key present and future applications of interest to DARPA that require exascale capabilities?

Harrod: There are five UHPC Challenge Problems that will drive the development of UHPC system designs. The UHPC Challenge Problems are intended to represent a basis for all modern DoD application codes.

UHPC Challenge Problems:

  • Massive streaming sensor data problem resulting in actionable knowledge
  • Large dynamic graph-based informatics problem
  • Decision class problem that encompasses search, hypothesis testing and planning
  • Two challenge problems to be selected from the HPCMOD Benchmark Suite or the DOD Create Program

The word “revolutionary” appears quite frequently in the DARPA RFI and BAA along with the comment that proposals based on existing technologies —those that take an evolutionary approach— will not be considered. This appears to be a call for a fundamental revolution in HPC design by the end of the decade. Just how high is the risk factor in this program?

Harrod: The overall risk to achieve all goals is high, but perceived as achievable by the selected performers. The risk associated with developing a computer that can achieve one PFLOPS in a single cabinet and 50 GFLOPS/W is considerably lower than the risk associated with developing a system that is reliable and highly programmable.

How about achieving the ambitious software and resiliency goals?

Harrod: Co-design of hardware and software will significantly reduce the risk associated with software and resiliency goals. Access to prototype systems in Phases 3 and 4 will provide an environment to evaluate and further advance software and resiliency goals, which will go further toward reducing this risk.

Will the winning research teams coordinate with the International Exascale Software Project and the DoE’s X-Stack program?

Harrod: I have already been participating in activities with the DoE and their exascale programs and the UHPC program will be coordinated with these communities. Furthermore, numerous performers in the UHPC program are participants in the international Exascale project and UHPC expects to coordinate with these Exascale efforts as well as other agencies.

Do you expect revolutionary solutions to be put forward by teams led by the traditional HPC vendors or will new organizations that have not been tied to older ways of working (e.g., the message passing model) spring up to offer new, unexpected solutions? Or both?

Harrod: Three of the four development teams are led by non-traditional HPC companies or organizations. We anticipate these teams will develop systems that will break the stranglehold imposed by MPI.

How does this fit in with DARPA’s conception of an open innovation environment?

Harrod: The UHPC program is the first program at DARPA to use the concept of an open collaborative research environment. Teams will participate in shared research efforts to attack the very aggressive program vision and goals. The UHPC collaborative environment will provide a forum where researchers can openly exchange, debate and formulate revolutionary ideas. Each team’s efforts are expected to result in a unique design. These problems are of such a significant nature that one team cannot fully solve them!

You are also asking for a paradigm shift in software. Please tell me more about the characteristics of extreme scale software and self-aware systems. What is the new execution model that will be required to create self-aware, secure, efficient and dependable exascale systems?

Harrod: A new model of computation or an execution model must be developed to enable the programmer to perceive the system as a parallel computer system, not as a collection of microprocessors, with memory and an interconnection network.

Current operating systems have pre-programmed behaviors that are based on estimates of resource performance and availability. What is required are operating systems and run-time environments that behave as a self-aware system.

What impact do you think this program will have on the development of future commercial HPC systems for scientific and business use?

Harrod: There is no requirement that the UHPC teams commercialize their prototype computers. However, it is expected that these systems will have a very significant impact on future products from the commercial teams.