“The move away from the traditional single processor/memory design has fostered new programming paradigms that address multiple processors (cores). Existing single core applications need to be modified to use extra processors (and accelerators). Unfortunately there is no single portable and efficient programming solution that addresses both scale-up and scale-out systems.”
The two methods of scaling processors are based on the method used to scale the memory architecture and are called scaling-out or scale-up. Beyond the basic processor/memory architecture, accelerators and parallel file systems are also used to provide scalable performance. “High performance scale-up designs for scaling hardware require that programs have concurrent sections that can be distributed over multiple processors. Unlike the distributed memory systems described below, there is no need to copy data from system to system because all the memory is globally usable by all processors.”
To achieve high performance, modern computer systems rely on two basic methodologies to scale resources: scale-up or scale-out. The scale-up in-memory system provides a much better total cost of ownership and can provide value in a variety of ways. “If the application program has concurrent sections then it can be executed in a “parallel” fashion. Much like using multiple bricklayers to build a brick wall. It is important to remember that the amount and efficiency of the concurrent portions of a program determine how much faster it can run on multiple processors. Not all applications are good candidates for parallel execution.”
The big data analytics market has seen rapid growth in recent years. Part of this trend includes the increased use of machine learning (Deep Learning) technologies. Indeed, machine learning speed has been drastically increased though the use of GPU accelerators. The issues facing the HPC market are similar to the analytics market — efficient use of the underlying hardware. A position paper from the third annual Big Data and Extreme Computing conference (2015) illustrates the power of co-design in the analytics market.
Achieving better scalability and performance at Exascale will require full data reach. Without this capability, onload architectures force all data to move to the CPU before allowing any analysis. The ability to analyze data everywhere means that every active component in the cluster will contribute to the computing capabilities and boost performance. In effect, the interconnect will become its own “CPU” and provide in-network computing capabilities.
The move to network offloading is the first step in co-designed systems. A large amount of overhead is required to service the huge number of packets required for modern data rates. This amount of overhead can significantly reduce network performance. Offloading network processing to the network interface card helped solve this bottleneck as well as some others.
“When the history of HPC is viewed in terms of technological approaches, three epochs emerge. The most recent epoch, that of co-design systems, is new and somewhat unfamiliar to many HPC practitioners. Each epoch is defined by a fundamental shift in design, new technologies, and the economics of the day. “A network co-design model allows data algorithms to be executed more efficiently using smart interface cards and switches. As co-design approaches become more mainstream, design resources will begin to focus on specific issues and move away from optimizing general performance.”
A single issue has always defined the history of HPC systems: performance. While offloading and co-design may seem like new approaches to computing, they actually have been used, to a lesser degree, in the past as a way to enhance performance. Current co-design methods are now going deeper into cluster components than was previously possible. These new capabilities extend from the local cluster nodes into the “computing network.”
Today’s High Performance Computing (HPC) systems offer the ability to model everything from proteins to galaxies. The insights and discoveries offered by these systems are nothing short of astounding. Indeed, the ability to process, move, and store data at unprecedented levels, often reducing jobs from weeks to hours, continues to move science and technology forward at an accelerating pace. This article series offers those considering HPC, both users and managers, guidance when considering the best way to deploy an HPC solution.
Successful HPC computing depends on choosing the architecture that addresses both application and institutional needs. In particular, finding a simple path to leading edge HPC and Data Analytics is not difficult, if you consider the capabilities and limitations of various approaches to HPC performance, scaling, ease of use, and time to solution. Careful analysis and consideration of the following questions will help lead to a successful and cost-effective HPC solution. Here are three questions to ask to ensure HPC success.