Let’s Talk Exascale: Software Ecosystem for High-Performance Numerical Libraries

Lois Curfman McInnes from Argonne

In this Let’s Talk Exascale podcast, Lois Curfman McInnes from Argonne National Laboratory describes the Extreme-scale Scientific Software Development Kit (xSDK) for ECP, which is working toward a software ecosystem for high-performance numerical libraries. She also partners with Michael Heroux of Sandia National Laboratories to lead the IDEAS scientific software productivity project. Members of the IDEAS team are catalysts for engaging the ECP community to understand productivity bottlenecks and to improve both developer productivity and software sustainability, as key aspects of increasing overall scientific productivity. This is an edited transcript of our conversation

The xSDK project provides infrastructure and interoperability among numerical libraries developed by diverse groups within ECP. A project that complements xSDK, called IDEAS, is working with the ECP community to understand productivity bottlenecks and to improve both developer productivity and software sustainability, as key aspects of increasing overall scientific productivity. Guest: Lois Curfman McInnes, Argonne National Laboratory.

Transcript:

What are your two respective projects, xSDK and IDEAS, all about?

Lois Curfman McInnes: xSDK is a first-of-a-kind project that’s looking to provide infrastructure and interoperability among software libraries developed by diverse groups within the ECP community. The project is motivated by the need for next-generation science applications to use and build on diverse software capabilities that are developed by different groups.

There’s recognition that no single individual, team, or library can provide all the functionality that next-generation applications need. So we as the ECP community must work together to enable our software to function in combination so that we can work toward addressing next-generation challenges in predictive science and engineering.

A key benefit of using software libraries is to encapsulate sophisticated capabilities in reusable software so that applications, say, in chemistry, nuclear physics, biology, and all sorts of areas, can build on well-tested and advanced algorithms and data structures that have been developed by experts. This way, scientists don’t need to write complex code themselves but can instead focus on their primary interests—whether that’s chemistry, physics, or whatever the discipline may be.

Within the context of the xSDK, we’re bringing together a community of diverse people who develop widely used libraries, and we’re defining community policies that are enabling our software libraries to interact and be used compatibly together in diverse scientific applications.

This work is motivated by the direct needs of application teams to pursue multiphysics and multiscale models, where often different components of the simulation are built on different underlying libraries, and consequently those libraries need to be brought together to function compatibly.

So, within xSDK, we are developing capabilities that enable our high-performance numerical tools to work together, and our foundational work has enabled us to release [during SC17] our first release of the combined set of libraries in the xSDK for ECP. This release includes the four founding xSDK libraries [hypre, PETSc, SuperLU, and Trilinos] as well as three additional libraries [MAGMA, MFEM, and SUNDIALS]. And additional mathematical libraries in ECP are working toward inclusion in future xSDK releases.

Through the xSDK effort, users can now download, install, and employ these numerical libraries in combination. This work then positions us to go forward and look at deeper levels of interoperability within the software as needed for next-generation applications.

A second project that I co-lead, that’s complementary to the xSDK, is the IDEAS project, which focuses on increasing the sustainability of software artifacts and the productivity of software developers by working with scientific software teams to identify and adopt practices that help improve software quality. This is a collaborative project involving various US Department of Energy [DOE] national labs to work toward partnering with science teams in the ECP to understand their productivity bottlenecks and figure out what their biggest pain points are in their codes. Then we will work toward identifying ways to incrementally improve their software practices to support better science. We also want to assist them in attaining better sustainability of their software to enable the pursuit of more sophisticated extreme-scale science goals.

Groups are looking to improve many practical capabilities that are essential for focusing on and doing their work. The areas of attention include software version control, testing, building, deployment, refactoring, and debugging.

We believe what the IDEAS project does is incredibly important, because scientific software is the means by which we as a community collaborate. It’s the way we bring together capabilities in mathematics and computer science, as well as domains in science and engineering, to make progress in extreme-scale computational science. So focusing on effective ways for teams to develop software, considering it as a primary vehicle of collaboration, is an important aspect of work for our community.

How are xSDK and IDEAS helping to advance scientific discovery, industrial research, and national security?

Lois Curfman McInnes: We’re working very hard to advance ECP community efforts, so our projects involve people throughout various DOE national labs. We’re also engaging people even outside of the projects to work together to create a sustainable software ecosystem that provides the foundation for the next-generation applications to do more advanced simulations and work toward their science goals.

What we see as incredibly important is providing software in a way that it is readily able to adapt as architectures change and are sustained and extended as needed to address new science capabilities. xSDK and IDEAS are coming at that from two complementary angles.

The scientific software development kit is creating community policies and using best practices in the development of reusable software that’s then employed by next-generation applications. So xSDK is helping through the actual software stacks themselves.

Meanwhile, the IDEAS project is examining productivity bottlenecks that are faced by ECP application teams and also software technology teams and developing best practices, processes, and tools to help those teams be more effective in developing their software.

And just generally speaking, xSDK and IDEAS are trying to assist the overall ECP community to sustain its software moving forward.

Besides your recent release, do you have other milestone highlights to report?

Lois Curfman McInnes: Yes, we have been working on the xSDK project, even before the ECP began, to look at issues and challenges in numerical software interoperability. We’ve been able to make very strong progress in the context of the ECP and in bringing together diverse teams to determine community policies and work toward a combined and collaborative process for software release.

We’re very excited that our milestone for this quarter, the first quarter of fiscal year 2018, is the first release of xSDK as needed by ECP teams. So this brings together community policies that have been discussed, adapted, and adopted by our library partners. This milestone achievement also brings together approaches for building, configuring, and distributing software. Our numerical libraries are built using Spack, which is a tool led by Todd Gamblin at Livermore. Spack helps make it easy to install the various pieces of software we need and enable these to work seamlessly across different architectures.

How are your projects collaborating with others and integrating the contributions of the various researchers?

Lois Curfman McInnes: One of the things I love about my work is that computational science is inherently collaborative. It’s very exciting to bring together the complementary contributions of people in the math and computer science domains. Our surveys of application teams to understand their biggest productivity bottlenecks have shown that many of the teams are aggregates of existing successful teams that are collaborating to achieve new science goals. Much of what we are focusing on is developing capabilities, processes, and practices that enable these teams to work together more successfully.

We’re not just working to improve the practices and processes of the developers of one package, but multiple packages working together toward next-generation chemistry, nuclear physics, or what not. So that’s been pretty exciting.

In reference to your earlier question about milestones, I just realized that I should also mention an important milestone that we’ve achieved for the IDEAS project, and that is releasing a set of resources called Productivity and Sustainability Improvement Planning [PSIP] tools. These are all accessible online through our Github repository and can be used by anyone. This is a set of resources that enables us to communicate with an application team to try to understand their biggest bottlenecks and then develop incremental, repeatable steps to move toward improving certain practices that they may find are limiting their effectiveness. We’re using these PSIP tools in our work with ECP application teams to help improve the group’s software practices in support of their science goals.

Would you say collaboration and integration are extremely important to the overall success of the ECP?

Lois Curfman McInnes: They are fundamentally at the heart of what we’re doing, as we’re working across software technologies, hardware and integration, and applications. There is quite a bit of collaboration, and that’s one reason I love this work and the ECP.

The ECP has an incredibly talented set of science leaders and teams working together toward these next-generation science goals. It is a really compelling mission, and it is inspiring us all to work together in ways we haven’t before in my career. It’s very exciting.

Has your research taken advantage of any of the ECP’s allocation of computer time?

Lois Curfman McInnes: Our numerical software teams develop and test their software on ECP computing resources. We develop all of our software so that it can run across the various computing facilities in DOE, and we strive to make it available on all mainstream machines that our users need.

To do the development and testing for our software, the individual library teams have indeed accessed the capabilities of the extreme-scale machines at Oak Ridge, Argonne, and NERSC [National Energy Research Scientific Computing Center]. Our most-recent xSDK software release has been tested on key machines at those facilities.

Your question about using allocations at facilities brings up the point that software technologies projects need different kinds of access to facilities than do typical application teams. Oftentimes, we as infrastructure providers need to test our software at various scales. Sometimes we need quick turnaround for lots of small jobs. We’re making sure our software is continually functioning properly. We sometimes need resources of the whole machine to test our scaling.

But we’re in a different mode of needing to use machines than mainstream applications that often focus on simulation campaigns; their usage models are different. We’re having some conversations with facilities about the kind of access that is really needed for software technologies projects.

What’s next for your ECP research activities?

Lois Curfman McInnes: We’re working with a variety of application teams in our xSDK project to test the use of our tools and to have an impact on better numerical software capabilities, to thereby enable their science. We’re working very carefully to make sure our algorithms and data structures within the libraries are all moving ahead as needed to work effectively on emerging architectures.

Within the libraries themselves, teams are working on refactoring and developing new algorithms, taking advantage of new machines. . . . Then within our work in the xSDK, we’re making sure all of those new capabilities can be readily packaged and made available to applications in a way that’s easy for them to use. Likewise, on the front of software productivity and sustainability, we are working with teams to improve some of their practices so that we, as a community, can work toward more sustainable software that’s more readily adaptable over time to meeting the continuing challenges that arise.

Download the MP3 Subscribe to RSS * Subscribe on iTunes

Sign up for our insideHPC Newsletter