International Exascale Workshop Culminates with US-Japan Collaboration Agreement

Print Friendly, PDF & Email

In this special guest feature, Michael Feldman from Intersect360 Research provides exclusive coverage of the Workshop on International Cooperation for Extreme-Scale Computing at ISC’14.

On June 22, the US Department of Energy (DOE) and Japan’s Ministry of Education, Culture, Sports, Science and Technology (MEXT) signed an agreement to collaborate on exascale supercomputing technologies for the scientific community. The idea is not to build some sort of US-Japan hybrid super; instead, the goal is to co-develop exascale-capable system software. The official purpose reads thusly: “Work together where it is mutually beneficial to expand the HPC ecosystem and improve system capability.”

beckman

Yoshio Kawaguchi and Shinya Tahata from MEXT and William Harrod, representing the US DOE, signed the agreement in a ceremony that took place at ISC’14, at the conclusion of a workshop on extreme-scale computing. Indiana University’s Thomas Sterling called the signing a “pivotal moment” and hoped there will be other international collaborations along similar lines.

Both the United States and Japan are working independently to field their own exascale systems, but certain areas, system software in particular, are of common interest to all efforts and can help accelerate such development globally. Japan is perhaps the furthest along in its exascale plans, having formulated a roadmap that puts its first exaflop-capable machine in production by 2020. The US, or at least the DOE, is aiming for 2022-2023 or thereabouts.

Technically, the software development work under the agreement applies to all current and future supercomputers, not just exascale systems. But since the size of these upcoming machines will cause them to exceed the capabilities of much of the current system software in use today, it is these future supercomputers that are now driving new requirements.

Specifically, the collaboration development will cover the following areas:

  • Kernel system programming interfaces
  • Low-level communication layers
  • Task and thread management to support massive concurrency
  • Power management and optimization
  • Data staging and I/O bottlenecks
  • File system and I/O management
  • Improving system and application resilience to chip failures and other faults
  • Mini-Applications for exascale component-based performance modeling

handshakeIn a nutshell, the plan is to build a common OS kernel that can be used by all post-petascale systems, regardless of hardware eccentricities. Although the processor, memory, interconnect and storage infrastructure for such machines are bound to be different from each other at the component level (chip architecture, core counts, network topology, storage architecture, etc.), the goal here is to design the OS plumbing that is independent of any hardware considerations.

The agreement also establishes a “Committee on Computer Science and Software” to coordinate the activities and development work between the two nations. Scientists, engineers, graduate and post-graduate students will be exchanged on a regular basis to promote transfer of knowledge and experience.

It’s worth noting that no European or other Asian countries were signatories on the collaboration agreement, although all exascale programs could theoretically benefit from the development that will take place under the US-Japan partnership. In particular, if the work manages to create a standardized set of APIs for an exascale-capable OS, it is certainly conceivable they will be adopted by other organizations.

The exclusion of other partners in this agreement is somewhat curious, especially given that the workshop itself also attracted participants from China, Taiwan, Australia, the UK, Germany, Spain, Switzerland and South Africa. The June 2014 meeting represents the second year of the workshop and was chaired by James Ang of Sandia National Laboratories, Pete Beckman of Argonne National Lab, and Thomas Sterling of Indiana University.

In addition to the US-Japan agreement, other workshop topics included presentations on power management, fault tolerance (including alternative checkpointing schemes), developments relating to the Square Kilometer Array (SKA) project in South Africa, the UK’s position on exascale investment, the status of the US DOE exascale effort, and other international work in developing extreme-scale software.