Interview: Why Co-design is the Path Forward for Exascale Computing

Gilad Shainer, VP of Marketing, Mellanox

Gilad Shainer, VP of Marketing, Mellanox

In this video, Gilad Shainer from Mellanox describes how co-design is the path forward to exascale computing.

“Co-Design is a collaborative effort among industry thought leaders, academia, and manufacturers to reach Exascale performance by taking a holistic system-level approach to fundamental performance improvements. Co-Design architecture enables all active system devices to become acceleration devices by orchestrating a more effective mapping of communication between devices in the system. This produces a well-balanced architecture across the various compute elements, networking, and data storage infrastructures that exploits system efficiency and even reduces power consumption.”

According to Shainer, Exascale computing will include three primary concepts: heterogeneous systems, direct communication through a more sophisticated intelligent network, and backward/forward compatibility. Co-Design includes these concepts in order to create an evolutionary architectural approach that will enable Exascale-class systems.

Transcript:

insideHPC: Gilad you started a series of articles recently on insideHPC about Co-design. The first one is entitled “InfiniBand Enables Intelligent Networks.” What is the important message here and where is this going?

Gilad Shainer: First, Co-design it’s not a Mellanox thing. Co-design is something that drives from the industry and the users. Essentially if you look on history of development in high-performance computing, we went through several technology changes. We went from SMP to clusters for example, and then we went from single-core to multi-core. And now we’re going through another technology change, we call it Co-design right now. All those technology changes essentially were done because we came to a performance bottleneck. And the  you need to find something that that’s going to solve that performance bottleneck.

The move from single-core to multi-core everyone knows, it’s was needed for performance gain because of the frequency increase of the CPU could not continue. So you couldn’t get the process to run any faster. So the solution was to more, “Okay, we cannot get the processor to run faster so let’s have more processes running in parallel. Let’s move from single-core to multi-core and let’s do more things in parallel because we cannot increase the processor itself.” That was a solution back then.

Now essentially, the multi-core has become a performance bottleneck. And this is what Co-design is designed to solve. And the reason that multi-core becomes a performance bottleneck is that with multi-core, you can run more things in parallel, which is fine. And today because of multi-core, you can find GPUs or core processors that have hundreds and thousands of cores. So you can break the job into very small scale and then run in parallel, but you cannot increase the processes running on those cores anymore regardless of if you add more and more cores. So this is where the performance bottleneck is. And essentially Co-design comes to solve that performance bottleneck.

The idea of Co-design is, ok, let’s stop looking on the CPU as the answer for everything–because this is where my performance bottleneck today. Let’s look on the application side and let’s look at all the algorithms that we’re using and running and see exactly what’s the best fit for where to run those algorithms. And basically, the key today to get high performance of applications, is actually to start working on the data when the data moves. Don’t wait for the data to reach the CPU, and then, only then work on it, but actually start running work on the data when the data moves. This is the only way you can accelerate processes or run processes faster. That’s the only way today.

So multi-core was a great solution for several years ago. Today multi-core is the performance bottleneck. In order to solve that performance bottleneck, you need to create the new core processor. And the new core processor essentially is the network, because moving intelligence to the network enables you to start work on the data when the data moves, and if you work on the data when the data moves, you can do thing faster and more efficiently. And the network is going to become the new co-processor for the next few years. So that’s the idea behind Co-design. There are multiple vendors that working together with users. And there is a lot of energy that goes to develop new intelligent capabilities within a network in order to achieve those elements.

This is the base of the series of articles that we are writing, and thank you very much for giving us the opportunity to post them. Going forward, you will see more users also post work that is bring done around the co-design, as this is essentially the key element to move us to the next level of performance.

Sign up for our insideHPC Newsletter