In this special guest feature, John Kirkley writes that a recent panel discussion with three Intel Fellows looked at how to move beyond traditional HPC for the future of scalable applications performance.
When three distinguished Intel Fellows—Bill Magro, Mark Seager and Al Gara—sat down together to discuss HPC’s Next Phase, the conversation was quite lively because all three are working on cutting edge aspects of the rapidly changing and evolving technology portfolio for the high performance computing ecosystem.
Moderating the panel discussion at SC15, Intel’s Mike Bernhardt kicked off the discussion with a question about the current wall for HPC memory and storage technology and how we can move past the limitations.
The panelists were optimistic. Al Gara commented, “There is a lot of exciting technology that is feeding into both of those areas.” He cited advances like the latest Intel® Xeon Phi™ processors code name Knights Landing which make new levels of memory available, and breakthroughs in non-volatile memory technology that “are allowing us to do things in storage that are fundamentally different from what we’ve done before.” He stressed that these changes do not necessarily mean incompatibility with legacy applications. “It’s not that we can’t do what we did before, it’s that we will be able to do so much more.”
Mark Seager, who also holds the position of Chief Technology Officer for the High Performance Computing Ecosystem at Intel, added that advances in this realm are not incremental. The addition of non-volatile memory on the node and advances in storage hierarchy have provided a one-time boost factor of 103, a 1000X improvement in certain parameters like latency. This, he said, will make a tremendous difference in running existing workloads and enable whole new sets of workloads.
Bill Magro noted that “the best thing that can happen is that your hardware advances, but your software doesn’t have to change.” But that’s unlikely – large systems have the potential for big disruptions to applications performance and this will have to be managed. “We need to ask how can you take MPI code and realize benefits in a composable way without rewriting your code from scratch.”
Looking at Intel® Scalable Systems Framework
Bernhardt then pointed the panel to the widespread interest in the recently announced Intel Scalable System Framework® (Intel® SSF) and one of its most important building blocks, the Intel® Omni-Path Architecture (Intel OPA), which was announced at the show. Intel SSF is a flexible blueprint for developing scalable, balanced and efficient HPC systems ranging from the smallest clusters to the largest supercomputers. Bernhardt asked if Intel SSF could accommodate both ends of the spectrum.
The consensus of the panel was that making full use of Intel SSF requires system thinking at the highest level. This entails deep collaboration with the company’s application end-user customers as well as with its OEM partners, who have to design, build and support these systems at the customer site.
Seager commented: “For the high-end we’re going after density and (solving) the power problem to create very dense solutions that, in many cases, are water-cooled going forward. We are also asking how can we do a less dense design where cost is more of a driver.” In the latter case, lower end solutions can relinquish some scalability features while still retaining application efficiency.
Gara, whose experience includes development of the system architecture for several generations of the IBM Blue Gene systems, said that with the Intel SSF ingredients – the processors, interconnect, storage, memory and software – architects can take a holistic approach from a system perspective. “Define the system and the ingredients fall out of that, rather than the other way around,” he said.
Added Magro, “This is a collection of building blocks that can be put together to work together, work better together … and be assembled into a number of different reference architectures: everything from classic HPC clusters up to supercomputers.”
I keep hearing ‘Traditional HPC is dead,’” said Bernhardt. “So what is HPC now as the worlds of HPC, analytics and Big Data all merge together? Is this new framework (Intel SSF) a key part of this?”
“Absolutely,” responded Magro, who is also the Chief Technologist for HPC software at Intel. “We step back and look at the workloads first and the scale of the system and see how you can address them with an architecture or collection of architectures that address the different segments. We’re not just putting these building blocks together to address individual point designs, but (we) are building designs that span these worlds.
“However the reason people are having difficulty bringing these building blocks together is that there are different storage models and resource management models,” he adds. “The most important thing is to arrive at a resource management framework that understands the different needs and expectations of those different models. (We can) start with an object storage model at the bottom and expose the correct semantics – whether it’s by HDF5, the Lustre* file system, or direct object store. We know how to do that and… will be able to work with the community to really create systems that tackle these problem sets at the same time.”
Seager pointed to an emerging new market as an example of Intel SSF in action – machine learning. “The interesting thing about machine learning is … the basic computations turn out to be very similar to the linear algebra problems that HPC has to solve as the inner loop of those applications,” he commented. “So the design space we have for the Intel Scalable Systems Framework comprehends the breadth of that class of applications so we can tailor solutions to a specific workload and create a generally applicable solution. (It) also provides the gradual evolution of these capabilities for exascale. (This includes) the portability of those applications not only within a generation, but also over multiple generations over time.”
Gara, referring to Bernhardt’s early comment about the death of traditional HPC, quipped, “I think it’s pretty obvious by the size of the audience (at SC15) that it’s not dead yet. But I also agree that workloads are changing; not that traditional HPC is disappearing but there are new workloads that are being added. And I think that traditional HPC workloads are growing, becoming more complex, requiring more performance.”
Gara said that one of the differences between current and traditional uses of HPC is that we are in a unique period with the emergence of many new technologies. “It’s all hitting us at the same time,” he noted, “which is an unusual situation – it gives us the opportunity to do things in a new way. When I say look forward, I’m thinking 2022 time scale…I see dramatically different ways that we’ll be doing compute.”
In the meantime, he does not envision the need currently for a dramatic upheaval to the user communities. “I think there will be opportunities to explore new program models and get even more performance, but I also see that we will continue to be able to get higher productivity out of machines as we go forward. I think the future is pretty bright when I see the technologies we’re bringing in to play, how we architect the systems, the choices that we make. In general advances are always driven from an application perspective – now one of the differences is that our application span is broadening so we are looking at (advanced technologies such as) machine learning and data analytics.”
Seager pointed out that HPC has become an integral part of the scientific method – the third leg along with theory and experiment. New disciplines, like health and life sciences are vigorously embracing this change in the way science is conducted. “They are starting later in time than engineering, physics and more traditional HPC users and addressing technologies like machine learning, data analytics and simulation sciences,” he said, pointing out that the opportunities for advancement in new markets is quite substantial. This includes softer disciplines such as the “digital humanities” and psychology, which are making good use of the various analytic and computational tools offered by HPC.
Speaking of emerging markets, Magro pointed to an “under served” segment – small to medium sized manufacturers in the U.S. and Europe. “Solving their technology problems is actually relatively easy for this community,” he said. “It’s not that hard to build systems that are compatible, capable, high performance and affordable.”
But he also noted that substantial barriers exist around business models and workforce capabilities – e.g., training manufacturing to create their first digital design using HPC capabilities. Or teaching them to articulate the argument as to why you should take the risk of moving from physical design and physical prototyping to digital design and prototyping simulation-based product design – a move that may be harmful to the business. It’s all part of dipping their toes into the water of HPC. By making this advanced hardware and software affordable to these new users, they can “consume a little bit, show value and then iterate their way up the curve.
Magro added, “There are technologies such as cloud computing that can address these kinds of issues and bring more people into the fold. And its not just small to medium sized business – there are a ton of consumers out there who could start using HPC in new ways.”
Gara agreed. “One of the hallmarks of good HPC design,” he said, “is when you design a machine and it’s used in ways you never imagined. When we look at the places where HPC could be applicable, we are only reaching a small fraction of these potential users. It’s not that the machines aren’t capable – we just don’t have enough programmers out there who have the knowledge to make it all happen,” Gara concluded.
A spirited question and answer period followed, ranging over topics like fault tolerance, error correction within applications, the emergence of microservices inside the cloud space, and the benefits of the newly formed OpenHPC initiative.
View the video of the full panel discussion * Sign up for our insideHPC Newsletter