What’s Next for HPC? A Q&A with Michael Kagan, CTO of Mellanox

Print Friendly, PDF & Email

As an HPC technology vendor, Mellanox is in the business of providing the leading-edge interconnects that drive many of the world’s fastest supercomputers. To learn more about what’s new for SC16, we caught up with Michael Kagan, CTO of Mellanox.

insideHPC: It seems like the HPC technology landscape is changing rapidly. What is on your agenda to accomplish at SC16?

Michael Kagan, CTO Mellanox Technologies

Michael Kagan, CTO of Mellanox Technologies

Michael Kagan: The ever-growing demand for higher performance drives technology innovations for HPC, which then spreads to other markets. We have witnessed several technology transitions over the years, such as the transition from SMP to clusters, or from single core to multi-core. We are now going through another technology transition, which some call Co-Design. There are many technology efforts to re-architect the data center from a CPU-centric architecture to a data-centric architecture in order to overcome the new performance bottlenecks. The new data centers will need to allow data operations and analysis everywhere in order to get insights in real time. This is the key to many of the emerging applications, and fundamental to enhance research and discovery. Mellanox is a key player in new technology development that will enable data analysis from the network level, and to be analyzed as it moves. The network is becoming a processor, delivering In-Network Computing to the applications.

At SC16, Mellanox will showcase the latest capabilities for in-network computing, and how the new data center architecture will enable us, the HPC community, to reach our future goals.

insideHPC: We’ve read a lot about networking architectures in terms of offloading vs. onloading. Why is this such an important topic for Mellanox going forward?

Michael Kagan: Offloading vs onloading architecture is not a Mellanox topic, it is the key in the Co-Design technology transition. The only way to enable real-time data analysis is to analyze the data wherever it is – and in the network. Onloading is an old concept created in the first days of the multi-core transition, under the claim that data center applications would not be able to leverage the number of cores available, and therefore one can use them to do non-computations tasks, such as managing the network. We all know that this is a false claim and one would desire to maximize the data center performance and efficiency, not to mention gaining a competitive advantage.

insideHPC: Machine Learning is a hot topic this year at SC16. How does Mellanox technologies accelerate Machine Learning and where are you headed with that?

Michael Kagan: Machine learning applications are very similar in their demand to high-performance applications. The systems architecture needed for machine learning applications are the same as the ones needed for HPC. Therefore, the same technologies that drive HPC performance, efficiency and scalability are the same leveraged by machine learning platforms. Mellanox solutions are being used in the vast majority of the machine learning systems out there.

The ability to manage and execute data operations in the network is a key technology here. Mellanox SHArP technology enables users to execute data aggregation and data reduction protocols as the data is being transmitted in the fabric. This is key for enhancing MPI and SHMEM/PGAS performance on one side, and machine learning applications on the other side. We invite the SC16 attendees to visit the Mellanox booth in order to learn more on how Mellanox can help accelerate machine learning applications.

infiniband-roadmap_113015

insideHPC: Mellanox is currently shipping EDR InfiniBand. What comes next and how will it change the supercomputing landscape?

Michael Kagan: It seems like we just introduced EDR 100Gb/s; but we are approaching HDR 200Gb/s and plan to enable such solutions in the 2017 time frame. Moving beyond EDR to HDR is critical not only for HPC, but also for the numerous industries that are adopting AI and Big Data to make real business sense out the amount of data available and that we continue to collect on a daily basis. Supercomputing today may not be too dis-similar from what it was a decade ago; of course, we are integrating new capabilities and are now working on more complex problems than ever before. HDR 200Gb/s technology will be a pivotal technology across the HPC arenas; it will not only double the available bandwidth, but will be very tightly coupled with the key computing elements; and itself be able to compute and work on the data. As the network becomes a fundamental co-processing element, as is a GPU or an FPGA; it will alleviate several of the bottlenecks, not only from a system level; but from the perspective of the datacenter. You will no longer have to move the data, or huge chucks of it to begin working on it. Mellanox will be the first to introduce capabilities into the network that keenly addresses the fundamental issues with communication overhead for HPC, but also the Big Data conundrum; which having more data than you can possibly realistically analyze and use in a timely manner.

insideHPC: Mellanox is part of the recently announced OpenCAPI Consortium. How does this fit in with InfiniBand?

Michael Kagan: Mellanox is part of all 3 recently announced consortiums – OpenCAPI, CCIX and Gen-Z. Mellanox participates in and supports all three consortiums. We also endorse open standards that can improve future data center performance, and do not believe in proprietary and closed solutions. We believe that the consortiums will create specifications that will increase CPU/memory/IO communications and will leverage Ethernet and InfiniBand technologies to enable the best possible platform connectivity. Mellanox plans to incorporate technologies from the new specifications in our future products as they become viable.

###

pnfslcThis Industry Perspective is just one of the great features in the new Print ‘n Fly Guide to SC16 in Salt Lake City. Inside this guide you will find technical features on supercomputing, HPC interconnects, and the latest developments on the road to exascale. It also has great recommendations on food, entertainment, and transportation in SLC.

Print n’ Fly Table of Contents

Download the Guide to SC16 (PDF)