In this guest article, Matt Ziegler, Director HPC & AI Product Management, HPC Product Marketing at Lenovo, explores the evolution and potential of exascale computing.
At this June’s International Supercomputing Conference (ISC) in Frankfurt, there will be a lot of buzz about exascale computing. The exascale hype has been gaining a lot of steam in the press lately, and for good reason. Ever since the petascale barrier was broken in 2008, technology users, companies and research institutions have set their ‘sites’ on the holy grail of computing milestones. Achieving this milestone is now being seen as within reach as companies and consortia announce their plans to build the first exascale system. Those in the supercomputing field know it’s all about the “Wow!” factor: the next biggest system, the next grand achievement, and the bragging rights associated with jaw-dropping size and speed. Supercomputing means never being satisfied and always looking forward. Future system designs are finally being unveiled that show promise, and as expected, the typical players, vendors and technology companies are all vying for a piece of history. Given the scope, scale and importance of such an achievement, it’s understandable. There is one inescapable conclusion: The exascale race has officially begun.
Is the past a prologue to the future? Let’s look back and consider previous achievements and the technology advancements that were required to make petascale attainable. At this year’s ISC when the June Top500 list is released, it will officially mark eleven years since IBM’s Roadrunner, installed at Los Alamos National Labs, officially broke the petaflop barrier. Also, important to note, when Roadrunner officially broke the petaflop barrier, it was exactly eleven years almost to the day from when ASCI Red at Sandia National labs was the first system to break the teraflop barrier. It took eleven years to increase performance 1000x. Where are we eleven years after Roadrunner? The top systems today are pushing the 200PF barrier, or only 1/5th of the performance achievement that was gained from ASCI Red to Roadrunner over the same amount of time.
The decelerating progress to exascale over the same tera- to peta- time frame has been well documented in the industry: The slowing and, finally, extinction of Moore’s law, the challenge and expense to get to 7nm or better processor fabrication, and the physical challenges of deploying an exascale system. The physical challenges alone are daunting: power and space constraints, cooling capabilities, network scalability, systems management, building and facilities. The technology challenges required to deliver gen-to-gen performance or breakthrough technologies has put pressure both the technology providers and the high-end HPC facilities. All that being said, we still should be closer to exascale than we currently are. So as an industry, we have some ground to make up.
Interestingly enough, the pressure to push the performance needle forward is causing the supercomputing industry to look backwards. Let me explain! Prior to Roadrunner, systems that broke performance barriers often relied on proprietary technology. Large technology companies built huge systems for a handful of customers. Supercomputers were thus only owned by a small slice of the overall computing market, and the non-supercomputing markets would have to settle on a handful of systems to perform painstakingly slow research. Computational advances were made at the top and the hope was that it would eventually trickle down to the masses. Roadrunner changed that paradigm.
The decelerating progress to exascale over the same tera- to peta- time frame has been well documented in the industry: The slowing and, finally, extinction of Moore’s law, the challenge and expense to get to 7nm or better processor fabrication, and the physical challenges of deploying an exascale system.
Like its supercomputing predecessors, Roadrunner was designed for a single, top-end customer. It also relied on alliances between technology heavyweights to design a system that would be able to achieve petaflop performance. However, what Roadrunner did differently democratized HPC away from the few. Roadrunner’s foundation was built using commodity-off-the-shelf (COTS) technology already available in the marketplace instead of custom, proprietary form-factors and technology. IBM chose the BladeCenter with AMD’s Opteron processor to provide the computing infrastructure. Linux was chosen as the operating system, and Extreme Cluster Administration Tool (xCAT) 2.0 version (also open source) was written specifically to tackle systems management at scale, emphatically declaring that open source technology was here to stay.
Roadrunner flipped the script on how to build a supercomputer, and in the early 2000s, supercomputers based on low-cost, COTS components and open source software proliferated as new industries demanded the competitive advantages that deep research could provide. The Linux operating system, coupled with x86-processor technology, provided the world with an open, inexpensive base computing standard, eliminating the need for the $100 million ante to play. Technological advancements that began in the personal computing space moved into the datacenter space. The most important consequence of this was that multiple vendors could provide the same computing technology, creating fierce competition. In parallel, the internet age was booming, and two-socket x86 systems became the de-facto standard for ISPs and hosting facilities that would come to be known as “hyperscalers” for their size and ability to drive the lowest sustainable price point. Building large systems from standard pizza-box servers provided a cost-effective method of delivering IT. Enormous, fit-for-purpose scale-up systems gave way to scalable general-purpose Linux clusters using common technology and tools.
Not surprisingly, governments, particularly the U.S. government, are at the forefront of pushing towards exascale. What’s interesting to note is unlike the petaflop predecessors there is a push to go back to using proprietary technologies. Proprietary interconnects, racks, cooling systems, motherboards, trays and non-standard form-factors. Non-common form-factors. Single-vendor availability. Obviously, the stall in progress toward exascale has fueled this re-reversal and could succeed in hitting 1,000 petaflops first. However, such a system would have little commercial appeal, as most customers have moved away from vendor lock-in, and their push to exascale is based on the open standards approach.
At Lenovo, we recognize that co-development and partnerships between the best and brightest minds are needed to successfully tackle grand challenges like exascale. We believe that the approach taken in the development of Roadrunner, using commonly available components and driving performance of those technologies through collaboration and co-development with the end user, was the correct one. Lenovo’s exascale approach will combine the knowledge we’ve gained from two decades of open standards development, with insights from our successful deployments at institutions such as the Leibniz Supercomputing Center (LRZ), in Munich, Germany. Our partnership with LRZ, for example, pushes Lenovo development to drive performance in our standard products. The base technology in LRZ’s SuperMUC-NG, (the #8 system on the November 2018 Top500 system), is available to all our customers today worldwide.
Lenovo will continue to co-design with customers like LRZ to help drive the next-generation base exascale compute system design. Lenovo’s approach to “cascade” computing advancements is at the core of who we are. Leveraging our deep partnerships and skills to advance computing, and then making those advancements available to all our customers is our ultimate goal. We adhere to designs that follow industry standards from everything to infrastructure, form-factors or software and systems management. We want to continue the legacy of Roadrunner by ensuring that advancements in HPC are available to everyone. Instead of looking backwards and developing purpose-built, proprietary systems that only a few can afford, Lenovo will continue to ensure that all users reap the benefits of technology innovation as it happens. Our goal is to make exascale available to everyone.
Matt Ziegler is Director HPC & AI Product Management, HPC Product Marketing at Lenovo.