Our in-depth series on Intel architects continues with this profile of Mark Seager, a key driver in the company’s mission to achieve Exascale performance on real applications.
What would you like to save? A few million in capital budget? How about thousands of lives every year?
These questions didn’t always bring to mind High-Performance Computing, with its history of massive investment and racks of cold metal devices. Even so, the power of HPC has advanced to the point where it’s literally in the palm of your hand when your phone updates the local weather forecast. Today, access to HPC machines is just a credit card away.
Like many mathematicians intrigued by computing, Intel Fellow Mark Seager set out to take advantage of emerging technology to solve scientific problems.
In the process, he and his colleagues not only changed the method of problem solving; they also changed the scientific method itself. As a result, the benefits of ever-greater computing power, once exclusive to leading-edge researchers, are reaching a very human level.
Paralleling parallel computing
Mark jokes that his professional life parallels the development of parallel computing. His first job, as a college student back in the mid-70s, was with the Air Force Weapons Lab in Albuquerque, where he wrote simulation code on a CDC 7600 machine using hundreds of punch cards – heady stuff at the time.
After that experience, I knew I wanted to do scientific simulation as a career,” Mark says. He earned a bachelor’s degree in mathematics and astrophysics from the University of New Mexico and went on to a Ph.D. in numerical analysis from the University of Texas at Austin. A job with the Lawrence Livermore National Laboratory soon followed.
At the time, vector supercomputing based on ECL (Emitter Coupled Logic, a predecessor to CMOS chip manufacturing technology) was giving way to “killer micros” – highly integrated CMOS devices Intel and other vendors were building. The National Labs started to consider how to aggregate multiple processors into some sort of architecture. They created The Parallel Processing Project and, in 1983, hired Mark into the team.
The system we worked on was an early prototype Sun-1 Workstation,” Mark recalls, “and Andy Bechtolsheim delivered it in his van from Palo Alto.” (Bechtolsheim co-founded Sun Microsystems in 1982.) The group’s first parallel system was a Sequent Balance 8000 with eight NS32032 processors and speed measured in MIPS (millions of instructions per second) and MFLOP/s (millions of floating point operations per second). “The last system I procured,” Mark continues, “had 65,536 nodes and more than 1.0 million cores, a scale of parallelism unimaginable in 1983 when I started at the Lab.”
As leader of the Livermore Advanced Technology office, Mark was instrumental in developing and deploying the ASCI White (ranked 1 on TOP500 list), Thunder (1), BlueGene/L (1 for a record 7 straight lists), Purple (3), and Sequoia (1) systems that held the number (1, 2, 2, 3, and 1) positions on the Top500 list when first deployed.
Power leads to prediction
Major strides in computing power, impressive as they are, were no surprise. After all, that’s what government agencies and universities were demanding and companies were striving to deliver. What was unexpected was the advance in scientific and engineering modeling and simulation predictive capabilities.
Over the last 25 years computer power has increased by a million, a phenomenal factor of 106.
To put it in practical perspective Mark points to Dijkstra’s Law, which says “A quantitative difference is also a qualitative difference if the quantitative difference is greater than an order of magnitude.” What that means for scientific computing is that an order of magnitude of improvement in computing power leads to a qualitative improvement in simulation capability, and with 106 improvement it’s a whole new ball game.
“So think about it,” he continues. “We now have an increase of six orders of magnitude. The types of scientific model simulations we can develop and run today are vastly superior to the models we had back in the day when I started in parallel processing.
What that [Dijkstra’s Law] means for scientific computing is that an order of magnitude of improvement in computing power leads to a qualitative improvement in simulation capability, and with 106 improvement it’s a whole new ball game.Dijkstra’s Law
We’ve gone from 2D models to 3D models. We’ve gone from very simplified physics equations to material models with very complicated ones. Along the way, we’ve gone from interpolative simulations – interpolating physical system dynamics between known data gathered from experiments – to predictive simulations.”
And predictive simulation has brought together theory and experiment in such a compelling way that it’s fundamentally extended the scientific method for the first time since Galileo Galilei invented the telescope in 1609 and extended human senses with manufactured devices.
A breakthrough that began with a crack
To illustrate, Mark cites his own work on metal fatigue back in 2003. He and his team were modeling the strength of materials and their failure under extreme conditions. In a military lab context that’s particularly serious research. “You don’t want to get the model wrong when you’re making a nuclear weapon,” he observes.
A long-standing question in fracture mechanics involved what it would take to make a crack in a metal bar go supersonic – to happen so fast that it reached the speed of sound. And whether such an event could even take place.
The test is fairly mundane. Take a bar of copper with a little divot. Hold the bar below the divot, and hit the bar above the divot with an anvil. Hitting the bar creates a crack that travels some distance at some speed. The test aimed to show whether the anvil could hit the bar hard enough to make it crack at supersonic speed, which produces a shockwave.
Theoretical mathematicians developed a model, analyzed the equations, and determined the answer was no, you can’t.
A team at Caltech developed an experiment where they measured the speed of the crack, failed to detect a shockwave, and likewise concluded the answer was no.
Mark’s team, who were building the 12.3 TFLOP/s ASCI White system at the Livermore Lab, ran a molecular dynamics application with a record 50 billion atoms in it that let them do a predictive simulation modeling crack propagation in copper. After two weeks of nonstop computing the answer came back yes: there is a shockwave and you can cause the crack to go supersonic.
For a long time we thought it was a bug in the system or code,” Mark explains. “After a lot more work we not only convinced ourselves we were correct but we also could explain how the theorists and experimentalists got it wrong, which was pretty amazing.”
By analyzing the vast quantities of computed time-dependent data that evolved the crack as it propagated through the copper bar with 2D visualization and making a movie of the event, Mark’s team could see that pounding with greater force ultimately produced a small secondary crack on the edge of the main crack as it propagated through the bar, and that the much smaller “daughter” crack went supersonic.
The theorists reached a faulty conclusion by over-simplifying their equations in order to make the math manageable, analyzing results with mathematically simplified equations that mathematically eliminated the possibility of a daughter crack forming. The experimentalists got it wrong by calibrating their instruments to gather data from the much larger primary “mother” crack. After recalibrating for the smaller daughter crack they found the shockwave the simulation predicted.
“A lot of scientific and engineering research programs were experiments first, then theory came along behind. Simulation was an afterthought,” Mark observes. “In this crack propagation simulation we saw one of the first glimmers of predictive simulation – and that it could drive the scientific discovery along with scientific simulation.”
Just one year later, in 2004, Mark received the Edward Teller Fellowship Award for his contributions to supercomputing.
Today, a dozen years later, that flip in the scientific method is pervasive. The tie between theory and experiment continues, but with theory and simulation tightly coupled. Theory goes into a simulator, and simulation results are compared to experimental data to validate the theory. Simulation also informs the design of experiments. The vast quantity of data experiments generate is analyzed to determine improvements in the simulation.
Simply performing experiments to prove a theory is history.
Where simulation is going – and taking us
As Chief Technology Officer for the HPC Ecosystem at Intel, Mark is managing what he terms a sea change in the shift to simulation science – and the oceans of data involved. Today, it takes simulation, which is compute-intensive, to convince a funding agent or commercial investor of the discoveries one can achieve if compute-intensive experiment and compute-intensive analysis are supported.
This change has so affected the nature of research that the Department of Energy, for example, is realigning the mission of its National Labs from large experimental projects to experimental projects prioritized by the needs of developing models and physical properties databases to quantitatively improve the predictive modeling and simulation capabilities in physics and emerging new applications in health and life sciences.
There’s no going back. That’s why Mark, in his current role as leader of HPC strategy for Intel’s Technical Computing Group, focuses on driving forward toward exascale and beyond. As Intel evolves from a supplier of highly integrated ingredients to a supplier of open scalable systems, he works to ensure the Knights series of processors, coupled with the Intel Scalable System Framework, support discovery by delivering whatever the varied players in the ecosystem need.
Going forward no longer means going it alone
The countries reaching exascale first,” Mark points out, “will make discoveries the fastest.”
The U.S. government endorsed that view in announcing the National Strategic Computing Initiative last July. An effort to create a cohesive, multi-agency strategic vision and federal investment strategy in HPC, the NCSI seeks to drive the convergence of compute-intensive and data-intensive systems while increasing performance. The target is one exaflop by 2023.
Government agencies will work with vendors to create advanced systems for applications involving combinations of modeling, simulation, and data analytics. They’ll also work with manufacturers and cloud providers to make HPC resources more readily available to researchers in the public and private sectors.
Creating and incentivizing an exascale program is huge. Yet more important, in Mark’s view, NCSI has inspired agencies to work together to spread the value from predictive simulation. In the widely publicized Project Moonshot sponsored by Vice President Biden, the Department of Energy is sharing codes with the National Institutes of Health to simulate the chemical expression pathway of genetic mutations in cancer cells with exascale systems.
This type of predictive oncology is a leap-ahead technology advance from the simulation of protein folding during the terascale era. What’s more, Moonshot’s public/private partnership amplifies discovery by including technology providers like Intel as well as drug companies developing ways to alter the chemical pathways expressing cancer.
While life-saving cures may be ahead of us, we already enjoy benefits as commonplace as planning a Saturday golf date thanks to 4-day and 7-day weather forecasts – predictive HPC simulations done daily by the National Weather Service and National Center for Atmospheric Research and then distributed to news services and smart phone applications.
As the need for HPC becomes more pervasive, a single-rack system with PFLOP/s performance at $2m becomes affordable to work groups, particularly when it’s shared. “With a scientific simulation application deployed in the cloud,” Mark observes, “you’re just a credit card away from the tens or tens of thousands of core hours you need to solve a wide range of complex problems.”
This ease of access will enable many more people to directly and indirectly improve their quality of life with scientific simulation.