While much noise is being made about the race to exascale computing, building productive supercomputers really comes down to people and ingenuity. In this special guest feature, Donna Loveland profiles supercomputer architect Robert Wisniewski from Intel.
I’ve always liked doing big science,” says Robert Wisniewski of Intel Corp., “and the notion of building the biggest supercomputer is a great goal.”
How big is biggest?
When Intel and Cray deliver the Aurora machine to the U.S. Department of Energy’s CORAL project in 2018, Robert will have overseen the system-centric design for the software of the world’s fastest supercomputer, running at a projected 180 petaflops (PFLOPS).
It will be a milestone for some, a stepping stone for Robert, whose team is already starting work on machines for the next decade.
Extreme Exascale
As a conversation piece at conferences, Robert carries a PEZ candy dispenser, ready to spark a discussion about Peta-Exa-Zetta, PEZ for short. It’s a handy acronym for the evolution of compute power from Petascale (1015) to Exascale (1018) to Zettascale (1021) operations per second. At the Zetta level we’re talking a thousand times exascale, which is itself a billion billion calculations per second. Exascale itself represents a thousandfold increase over the petascale computers that came into operation in 2008.
Robert’s excitement about Aurora is twofold. He stands with feet in two closely aligned worlds.
He’s Chief Software Architect for CORAL, the Collaboration of Oak Ridge, Argonne, and Livermore. A joint procurement activity among three of the Department of Energy’s National Laboratories, CORAL was established by the DOE in early 2014 to leverage supercomputing investments and develop key tools for technology advancement and scientific discovery.
Robert’s also Intel’s Chief Software Architect for Exascale Computing, a title he’s deliberately transitioning to Extreme Scale Computing because, to his thinking, exascale is misleadingly small. In partnership with Cray, Intel will be creating machines that will be five to seven times more powerful than today’s fastest systems.
Transition All Around
The word “transition” figures largely in Robert’s work.
Argonne’s current HPC machine is Mira, a 10 PFLOPS IBM Blue Gene/Q system delivered in 2012. Mira’s successor can be viewed as a pair of freshly architected systems. Theta, projected to run at 8.5 PFLOPS, is an early production system based on the second-generation Intel® Xeon Phi™ processor. Scheduled to arrive later this year, Theta is what Argonne calls a bridge between Mira and its “next leadership-class supercomputer, Aurora,” to be delivered in 2018.
To support users transitioning from Mira, the Argonne Leadership Computing Facility (ALCF) set up an Early Science Program (ESP) modelled after its highly successful ESP for Mira. Like its predecessor, the Theta ESP works as a test bed, running selected compute-intensive applications that represent a wide spectrum of scientific areas and numerical methods.
Though ESP stands outside his CORAL responsibility, Robert meets frequently with the Argonne applications staff and system software team, helping to prepare scientific applications – and scientists – for the scale and architecture of the new system, as well as its new software stack. He regards ESP participants not only as customers, but also as collaborators in helping tackle the challenges of future computing.
Interacting with computational scientists, domain experts who understand computer science, is particularly valuable. “They can talk to us in a way that provides insight into how we can design better software and influence hardware,” he says.
Really Big Projects
Even in his early academic and professional life, Robert operated at a broad level. He switched from the discipline of computer engineering to computer science in his sophomore year at Cornell University and joined HPC-leading Silicon Graphics after earning his doctorate at the University of Rochester because he “always liked envisioning and building really large machines.” Looking back, Robert sees his career in supercomputing as “inevitable,” based on his skills and interests.
At SGI, Robert worked on high-end parallel OS development, parallel real-time systems, and real-time performance monitoring. What drew him to IBM Research a few years later was the opportunity to work on K42, an operating system kernel research initiative to design, from the ground up, an OS that would scale up to largest capability-class machines and scale down to running on small clusters.
During his 15 years with IBM, Robert’s career progressed to chief software architect and manager for the last two generations of IBM’s Blue Gene, a series of PFLOPS systems that led the TOP500 rankings of the world’s most powerful and most power-efficient supercomputers. (“Gene” derives from IBM’s goal of supporting the massive scientific computation required for folding a protein.)
Blue Gene fostered a broadly collaborative environment. What Robert saw then as a unique opportunity for software to influence hardware has become the hallmark of his own wide-reaching work.
Throughout Blue Gene, all three teams, Hardware, Applications, and System Software, routinely sat down together and focused on two key questions: What do we need to do to make this machine work well? and What do we need to do to meet the needs of the customer?
As Chief Software Architect and Manager of Blue Gene Supercomputer Research, Robert answered those questions with Argonne’s Mira, the 10 PFLOPS Blue Gene/Q, and Sequoia, the 20 PFLOPS Blue Gene/Q, which was the fastest machine in the world in June 2012.
That year, Robert joined Intel, eager to have a broader impact and achieve greater performance, namely, reach exascale and transition computing by designing software that scales to an extreme level, beyond exascale.
What Software Can Do
What enticed Robert to Intel was the prospect of effecting change far beyond the project level. He recognized the opportunity to push the development envelope through close collaboration with all three elements of a supercomputer team: hardware, applications, and system software (everything between the two). “An important part of my role,” he says, “is to influence future hardware generations so that as we write software we can realize the potential of the hardware.”
The researcher in him, looking back on 25 years of HPC work, knew the value of customer interaction. Hence his finger on the pulse of Theta testing in Argonne’s Early Science Program. More than serving as a conduit to the hardware team, he’s guiding development software for applications geared for future platforms.
In Robert’s view, which goes beyond the horizon, Intel’s Knights series of processors – the Intel Xeon Phi processor series – is where technology is headed. “Whenever we design for HPC,” he points out, “we need to be very aggressive to achieve the goals the customer is looking for.” At present, Robert’s reach extends to the challenges application developers face.
Insider Advice
In the near term, which by Robert’s definition takes the future into account, interplay between application developers and system developers happens in two key areas.
The first, and most important, is threading. HPC applications are designed to achieve parallelism by communicating among nodes that do not share memory with MPI. However, as the number of cores within a node has increased, application developers now need to expose parallelism through threading within an MPI rank to take advantage of the full power of the machine. A lesson learned from his days with Blue Gene/P, when the trend was just starting: threading makes a huge difference. With the Intel Xeon Phi processor, threading code will make an even more significant difference. Runtimes for other programming models, such as PGAS, are experiencing the same effects.
The second is balancing computation and memory. The relative amount of memory has been significantly reduced in comparison to the increase in available computation, and new levels in the memory hierarchy have been introduced to address the memory bandwidth challenge. Together, these require application developers to rethink how data is placed in memory, and to ensure critical data is placed in the smaller amounts of higher bandwidth memory.
In combining the threading and memory challenges, there’s an increased need for the hardware to perform synchronization operations, especially intranode ones, efficiently. With more threads utilizing less memory with wider parallelism, it becomes important that they synchronize among themselves efficiently and have access to efficient atomic memory operations. Applications also need to be vectorized to take advantage of the wider FPUs on the chip. While much of the vectorization can be done by compilers, application developers can follow design patterns that aid the compiler’s task.
Robert co-leads a Software Technical Interchange Meetings where company experts regularly discuss plans, raise issues, bridge gaps, and share information across the various ongoing HPC software efforts. Their goal: make a system-centric model work. No one operates in isolation; everyone has a view to the components across the whole software stack. This inclusive approach represents a cultural shift from Intel being an ingredient provider to that of being a solutions provider.
The Frame for a Bigger Picture
Under the “Intel Inside” banner, the company has earned its strong reputation as an ingredient supplier.
Five years ago, Robert points out, the HPC software world looked at Intel and saw “an amazing MPI, a top-notch compiler, and cool developer tools,” not a system software solution. Robert and his team are driving to change that by taking the power of supercomputing not only higher but also wider and broader.
The framework for the future is just that: Intel® Scalable System Framework (Intel® SSF). Intel SSF can be understood both as a platform definition and as a realization of that definition.
Consider, as Robert and his team do, the demand to provide computation- and data-intensive capability across a spectrum of areas beyond traditional government and research labs. Business, medicine, engineering, and social sciences crave access to HPC.
For it to succeed at that level, paradoxically, the level of computing needs to drive higher, to an extreme scale. Administering a machine operating at 50,000 nodes or greater calls for techniques beyond simply going node-to-node. By bringing out the capability in Intel SSF, increasingly powerful systems (the Aurora class and upward) will expose users to the technology they need and enable them use it.
Making Bigger Broader
“It’s really cool to work on the machine that’s going to be the biggest and fastest in the world,” Robert says. “But there’s only one of them – or a small handful.
I came to Intel to be able to realize my vision of software having impact across all of technical computing and HPC. With OpenHPC, and future Intel products based on OpenHPC, we’re working to integrate a multitude of components that are commonly used in HPC systems. We now have a vehicle for collaborating on high-end technologies within the broadest ecosystem in the world.” Community input is already flowing at a pace Robert terms amazing. “Customers who previously were unable to leverage HPC will be able to take advantage of it and have a large impact.”
Could this be analogous to the way NASA technology, once considered esoteric, found its way through business and industry into familiar advances like LEDs and infrared thermometers? It just may be.
Robert Wisniewski will be a speaker at an ISC 2016 BoF session: Monitoring Large-Scale HPC Systems: Data Analytics & Insights, which takes place June 22 in Frankfurt.