Berkeley Lab recently hosted the fourth annual X-Stack PI event, where X-Stack researchers, facilities teams, application scientists, and developers from national labs, universities, and industry met to share the latest developments in X-Stack application codes.
X-Stack was launched in 2012 by the U.S. Department of Energy’s Advanced Scientific Computing Research program to support the development of exascale software tools, including programming languages and libraries, compilers and runtime systems, that will help programmers handle massive parallelism, data movement, heterogeneity and failures as the scientific community transitions to the next generation of extreme-scale supercomputers. A total of nine X-Stack programs were designated to develop complete solutions that address multiple components of the system software stack: DEGAS, D-TEC, XPRESS, Traleika, DynAX, XTUNE, GVR, CORVETTE and SLEEC.
During the first three years of the program, these projects have completed research and development of programming models, programming environments and runtime systems for exascale. During the fourth year, which began in September 2015, the development teams are extending their results and developing additional benefits for the application codes.
The goal for this year’s X-Stack PI meeting was to demonstrate the latest advances in the codes, with an eye toward delivery in the latter part of 2016. Toward this end, a Technology Marketplace held during the April meeting gave developers the opportunity to demo the software prototypes; a total of 20 demonstrations were given during the two-hour marketplace event, with 15 individual teams enabled by NERSC to show emerging exascale technologies in the development phase.
As part of the PI meeting, NERSC reserved all of Cori and 1,000 nodes (32,000+ cores) of Edison to allow the computer scientists to demonstrate their technologies at scale,” said Alice Koniges, the NERSC PI on the XPRESS project who organized the X-Stack demos. “Some projects ran directly on the NERSC machines during the meeting demo period, while others collected results prior to the meeting and used special X-Stack developed tools to analyze and interpret data collected before the meeting itself.” She credited Richard Gerber, NERSC’s senior science advisor, with obtaining the allocations on Cori and Edison for the X-Stack teams.
Some of the demos that required GPU technologies were run remotely on Titan at the Oak Ridge Leadership Computing Facility, which also set up a special reservation for the computer market place, she added. In addition, a pre-release Intel machine that has the Cori-2 hardware prototype was also made available to the demo researchers through an agreement with Sandia National Laboratories.
Here are some highlights from the X-Stack meeting Technology Marketplace demonstrations:
DEGAS: Leveraging HipMer Extreme Scale Genome Assembler via a NERSC Web Portal. De novo assemblers are a key computational method for reconstructing an unknown genome, but they are limited by slow runtimes and limited scalability. So a team of Berkeley Lab and UC Berkeley researchers developed HipMer, the first end-to-end HPC parallelization of Meraculous, a cutting-edge de novo genome assembly tool developed by the Joint Genome Institute. By applying some novel algorithms, computational techniques and the innovative programming language Unified Parallel C to Meraculous, they have been able to reduce the genome assembly process from days to minutes.
During the X-Stack meeting, Lenny Oliker and Steve Hofmeyr of Berkeley Lab’s Computational Research Division presented a web portal interface being implemented at NERSC that will allow the external bioinformatics and computational research community to remotely leverage DEGAS’ scalable de novo assembly capabilities.
The DEGAS team is a joint California/Texas effort that includes Berkeley Lab, Rice University, the University of Texas at Austin, UC Berkeley and Lawrence Livermore National Laboratory (LLNL).
D-TEC and Stencil Computations. Two of the X-Stack demos featured D-TEC and stencil computations. The first involved Halide, a stencil domain-specific language (DSL) that offers portable, high-performance stencil pipeline execution by allowing a programmer to write an algorithm only once and then manipulate a high-level scheduling language to easily optimize performance for different platforms.
At the X-Stack meeting, Riyadh Baghdadi, a post-doc at MIT, demoed an image processing application written in Halide and running on three different architectures: parallel shared memory system, GPU and NERSC’s Cori system. Nine Halide image processing pipelines required approximately 15 new lines of code to become distributed, and several exhibited near-linear scaling up to 16,000 cores on Cori.
A second D-TEC demonstration involved the X10 programming language, a simple, clean, powerful and practical language for scale-out computation using the asynchronous partitioned global address space (APGAS) model. The D-TEC team demonstrated how control structure overloading can be used to implement efficient parallel iteration, including tiling patters for stencil computation. They also presented results for the LULESH hydrodynamics proxy application, comparing the X10 implementation with the OpenMP/C++/MPI implementation. The team found that in this example, the X10 code was 40 percent shorter and also significantly faster when run on up to 1,024 nodes on NERSC’s Edison system.
The D-TEC team comprises researchers from Berkeley Lab (Phil Colella), LLNL, MIT, Rice University, IBM, Ohio State University, UC Berkeley, University of Oregon and UC San Diego.
XPRESS: HPX-5 Integrated APEX. Among the X-Stack codes being developed by the XPRESS (eXascale Programming Environment and System Software) team is HPX-5 (High Performance ParalleX) is an open source, portable, performance-oriented runtime developed at CREST (Indiana University). HPX-5 provides a distributed programming model that allows programs to run unmodified on systems from a single SMP to large clusters and supercomputers with thousands of nodes.
For the X-Stack demo at Berkeley Lab, the XPRESS team showed the performance scalability of HPX-5 integrated with the autonomic performance environment for exascale (APEX). The demonstration also showed the LULESH application running on NERSC’s Cori system using the photon integrated communication library, which supports a tight coupling of the runtime system with the underlying network fabric that scales and remains performant in exascale environments.
In addition to Berkeley Lab, the XPRESS team comprises researchers and computational scientists from Sandia National Laboratories, Indiana University, Louisiana State University, Oak Ridge National Laboratory, University of Houston, University of North Carolina at Chapel Hill and University of Oregon.
Traleika: Intel Open Community Runtime (OCR) Tools and Applications.The Open Community Runtime project, which is supported in part by the Traleika Glacier X-Stack program, is creating a runtime system framework that explores new programming methods for machines with high core count. The initial focus is on HPC applications. OCR is an open-source project that includes components for task scheduling and resource mapping in homogeneous, heterogeneous and distributed environments.
During the Technology Marketplace, the Intel X-Stack team demonstrated the Open Community Runtime tools and applications, running a mixture of applications and kernels—including HPCG, CoMD and 2D stencils—on 1,000 nodes of NERSC’s Cori and Edison systems.
In addition to Intel, the Traleika team includes Reservoir Labs, UC San Diego, Rice University, University of Illinois at Urbana-Champaign and Pacific Northwest National Laboratory.