A New Direction in HPC System Fabric: Intel’s Omni-Path Architecture

In this special guest feature, John Kirkley writes that Intel is using its new Omni-Path Architecture as a foundation for supercomputing systems that will scale to 200 Petaflops and beyond.

When it comes to the architecture of high performance computing (HPC) systems, processors and storage solutions often take center stage. With Exascale waiting in the wings, the focus is on the scalabiity, massive parallelism, resiliency and power efficiency that these systems can provide. But moving more into the spotlight every day is an essential component of these new generations of powerful supercomputers and HPC clusters – high bandwidth, low latency interconnects that can handle the transfer of vast amounts of data throughout the computing infrastructure.

There are a number of HPC interconnect solutions in the market today that are based on variations of infiniBand and Ethernet, but a new kid just showed up on the block. Intel has announced that it will make its next generation fabric available later this year – Intel’s Omni-Path Architecture brings a brand new set of capabilities to the HPC community.

The Intel Omni-Path Architecture is an evolutionary design with revolutionary features,” says Joe Yaworski, Intel Director of Fabric Marketing for the HPC Group.

Yaworski notes that this next generation fabric builds on the best features of the company’s successful True Scale Fabric. (True Scale was developed by Intel based on its acquisition of QLogic’s InfiniBand product line several years ago.)

With its ability to scale to tens and eventually hundreds of thousands of nodes, the Intel Omni-Path Architecture is designed for tomorrow’s HPC workloads. The platform has its sights set squarely on Exascale performance while supporting more modest, but still demanding, future HPC implementations.

Intel’s Omni-Path Fabric addresses some very specific problems associated with today’s HPC environments – for example, how to deliver the fabric bandwidth and performance that the next generation of many-core, highly parallel CPUs require, such as those being developed with Intel’s Knights Landing and their next generation Xeon Phi processor named in the 180 PetaFLOPS Aurora announcement, Knights Hill.

Today’s traditional fabrics are saddled with some major limitations, including a high price tag. Summarizing some presentations I’ve recently seen, when you look at the cost of fabric as a percentage of the overall HPC budget, InfiniBand FDR is running somewhere between 25% and 35%. And with the introduction of the 100Gb/s EDR InfiniBand, the fabric’s piece of the pie will jump to between 35% and 50% of the HPC budget. That’s just not viable in the long run, and limits a customer’s ability to maximize their compute horsepower within a fixed budget.

Because processor capacity and memory bandwidth are scaling faster than system I/O, a new solution is needed – one that reliably provides higher available bandwidth per socket to handle high priority message passing interface (MPI) traffic as well as large storage data sets.

Also needed is an integrated fabric controller to help reduce the cost and space requirements of discrete cards, enabling higher server density and freeing up more space to maximize I/O density for other networking and storage HPC infrastructure components. And cards are notorious power hogs – an integrated interface card with fewer discrete components located on the controller will help ease that burden.

High on the HPC interconnect wish list is the ability to exert very fine grained control over the traffic moving across the fabric, a capability lacking in traditional solutions. This kind of precise control is required to effectively and efficiently deliver the data the processor needs when it needs it. This is particularly important in mixed traffic environments where storage MTUs (maximum transmission units) can block the transmission of smaller, higher priority MPI messages. (MPI is the primary protocol for most HPC applications.)

Another factor is scalability. Says Ed Groden, Intel product marketing manager, “As we move toward Exascale, we need a fabric that is resilient enough to support 150,000 to 250,000 nodes. The combination of faster line speeds and increased scale means more errors, something we have to deal with efficiently.”

Groden adds that routing is another major consideration. “We need to be able to get messages through the fabric in the fastest and cleanest way possible. In a mixed traffic environment we have to be able to prioritize big storage messages and smaller MPI compute messages to avoid log jams and make the most effective use of both deterministic and adaptive routing.”

Higher Bandwidth, Lower Latency

By taking an evolutionary approach, the Intel Omni-Path Architecture provides the higher performance and scalability required by today and tomorrow’s advanced HPC systems with their constantly growing workloads. As a next generation fabric, Omni-Path will deliver: impressive bandwidth – 100Gbps per port; extremely low latency that remains low even at extreme scale; and significantly higher message rates.

Omni-Path builds on the best features of Intel’s widely adopted True Scale Fabric and InfiniBand. For example, True Scale is noted for its high MPI message rates. Explains Intel’s Yaworski, “True Scale breaks up the problem into discrete messages and sends them out to all the nodes, sockets and cores that are working on that particular problem. So, a very high message rate allows the data to be driven to other nodes very efficiently. True Scale’s outstanding ability to handle small messages becomes very important as you scale up the problem and distribute it over an increasing number of nodes, sockets and cores.”

Yaworski also points out that True Scale uses connectionless design. This feature means that as you scale, latency remains consistently low independent of the level of scale and the type of messaging pattern being employed.

True Scale also uses a layer called PSM (performance scale messaging), which drives the very high messaging rate. Because PSM is better matched to MPI, it communicates more efficiently than InfiniBand verbs, requiring less CPU cycles to complete the communication. A streamlined interface means that PSM transmits small messages efficiently and quickly, insuring that latency stays low despite scaling and variable message patterns.

As a next generation fabric architecture, Omni-Path builds on these True Scale strengths and adds revolutionary features of its own.

For example, in order to deliver the bandwidth and performance required by next generation CPUs such as Knights Landing and Knights Hill, Intel engineers are driving the fabric increasingly closer to the CPU. This deals directly with the fact that processor performance has evolved more quickly than the supporting fabric, resulting in the higher relative cost of the fabric compared to the CPU.

Providing more bandwidth and more efficiently delivering data to the CPU is a key attribute of Omni-Path,” comments Yaworski. “So when designing next generation fabric, you need to minimize power, cost and density, while providing better data movement and error detection at scale.

“Customers using next generation processors want a fabric that is balanced with the CPU and its memory architecture,” he adds. “Omni-Path not only drives the fabric closer to the processor, but also improves its integration with the CPU.”

Yaworski identified five value vectors resulting from this close CPU/fabric integration:

Higher performance – fundamental to supplying all the servers that make up an Exascale deployment
Lower density – Omni-Path delivers more performance without adding PCIe slots
Reduction in power – A key factor in Exascale deployments
Reduction in cost – Not only are no additional PCIe slots required, in some cases the PCIe bus can be removed from the front of the communications path
Greater reliability – Again, by reducing or eliminating components and integrating the CPU and fabric, it becomes possible to achieve the level of reliability needed to reach Exascale.

Omni-Path is optimized for HPC by providing very high MPI message rates and overall better performance and scalability. Combine this with an architecture that retains low end-to-end latency at scale and enhancements for file system traffic, and you have a platform that is able to move twice the amount of data in a transfer when compared to InfiniBand.

Omni-Path’s switching fabric ties everything together. It provides extremely high traffic throughput on each of the switch ports, minimizing latency. In addition, the fabric has a very efficient automatic error detection and correction mechanism that avoids wasteful end-to-end retries and added latency. (InfiniBand EDR is equally as good detecting single or multi-byte errors, but it does add unwanted latency.)

The fine grain control of the data moving over the Intel fabric also allows users to add Quality of Service (QoS) features to the flow. Data can be sliced into small increments allowing users to make more accurate and appropriate decisions about traffic priorities.

Breaking it down

Here’s a brief overview of the four main components that make up the Omni-Path product line:

Host Fabric Interface (HFI) ASIC – HFI will be used in Intel-branded PCIe-based adapters, integrated into both Xeon processors and Xeon Phi processors, and will also be used in third party products.
Switch ASIC – The switch silicon will be used in Intel Edge Switch products and Intel Director Class Switch products, and will also be used in third party switches.
Software components – Host side drivers, switch management firmware, and Intel Fabric Suite fabric management software
Intel Silicon Photonics transceivers and cables – Silicon photonics transceivers integrated into the Director Class Switch (DCS), enabling high-density switch products and simplified cabling

Leveraging the Open Fabric Alliance

The Intel Omni-Path Architecture strategy is to leverage the OpenFabrics Alliance (OFA) software architecture. This allows Omni-Path to be compatible with the large base of True Scale and InfiniBand applications that have been developed over time. The OFA also provides access to the set of mature protocols contained in the OFED (Open Fabric Enterprise Distribution) release.

Omni-Path has a lot going for it. It’s a complete end-to-end solution with a full range of switches, adapters, software, cabling and silicon. It is an optimized CPU, host and fabric architecture that cost effectively scales from entry to extreme deployments. Omni-Path is compatible with existing True Scale Fabric and Open Fabric Alliance APIs.

It looks like any HPC applications – and in particular MPI-based apps that are communications intensive – will benefit from Omni-Path. For example, the advanced fabric will play a major role in supporting the many uses of modeling and simulation. Included are compute-intensive workflows based on the use of computational fluid dynamics (CFD) and Finite Element Analysis (FEA) to solve problems in manufacturing, the life sciences – including genomics and molecular dynamics – oil and gas exploration, or grand challenges in physics and astrophysics.

Any organization that relies on HPC for designing, simulation and prototyping will find that Omni-Path deserves a close evaluation for architecting a balanced platform that can take on the most demanding workflows.

According to Intel, the Intel Omni-Path Architecture will be introduced in the fourth quarter of 2015.

Editor’s Note: Intel will be disclosing additional details about Omni-Path in a webinar that will be conducted live from ISC on Monday, July 13.

Sign up for our insideHPC Newsletter.

Sponsored Guest Articles

‘Glow-in-the-Dark’ GPUs, Holes Burnt in Boards, Overprovisioning Systems ‘Until Funding Runs Out’ and Other Factors Calling for Optical I/O

White Papers

Energy efficiency drives HPC to the cloud

Featured RSS Feed

More News from insideBIGDATA