Co-design Architecture for Data Analytics And Machine Learning
The big data analytics market has seen rapid growth in recent years. Part of this trend includes the increased use of machine learning (Deep Learning) technologies. Indeed, machine learning speed has been drastically increased though the use of GPU accelerators. The issues facing the HPC market are similar to the analytics market — efficient use of the underlying hardware. A position paper from the third annual Big Data and Extreme Computing conference (2015) illustrates the power of co-design in the analytics market.
This the fifth and final article in a series from the insideHPC Guide to Co-Design Architectures.
The Terasort contest lists the top platforms for sorting 100TB data. The Number 1 platform runs on vanilla Hadoop (doesn’t care which HW it runs on) with 2100 nodes, 12cores per node, 64 Gb per node, 24.000 cores and 134 Tb memory taking 4300 seconds. Now the Number 2 platform (Tritonsort) is optimized for HW, and is written in C with 52 nodes, 8 cores per node, 24 Gb, 416 cores and 1,2 Tb memory taking 8300 seconds. This shows the importance of hw/sw co-design for big data: Vanilla Hadoop may be easy to program, but needs 57X more cores, 100X more memory, and only gets 2X performance.
Similar to HPC applications, higher-level software interfaces were designed to insulate developers from hardware details. When performance is important, these interfaces can turn in to bottlenecks. Many analytics applications also have very unique communication patterns and processor/memory usage. Looking at these patterns and employing a co-design approach provides a way to expand problem size without adding more and more hardware—that will eventually show no effect.
Conclusion
A single issue has always defined the history of HPC systems: performance. While offloading and co-design may seem like new approaches to computing, they actually have been used, to a lesser degree, in the past as a way to enhance performance. Current co-design methods are now going deeper into cluster components than was previously possible. These new capabilities extend from the local cluster nodes into the “computing network.”
The first supercomputer systems were designed for a general class of problem—those that require large amounts of mathematical operations. This specialization continued through the various epochs and has resulted in a blurring of the hardware and software silos that often define computer systems. Indeed, the co-design approach brings together developers and hardware designers to “purpose build” a computing machine around an important problem (or problem set). This technique is contrary to the existing situation, where developers are required to mold or adapt problems around the hardware that shows up in the data center. Exascale systems will leverage co-design methodologies to provide the next level of production FLOPS for important HPC applications.
Co-design methods are not limited to high-end systems. Designing a machine using the latest components (CPU, GPU, storage, and network) and protocols (UCX, FCA/Core Direct, SHArP and CCIX-when available) will provide unprecedented flexibility to end-users. Important applications can be co-designed and referenced hardware designs made available by vendors. As the industry enters the co-design epoch, new levels of computing efficiency and performance will become available.
Over the previous weeks we have explored each of these topic in detail.
- Designing Machines Around Problems: The Co-Design Push to Exascale
- The Evolution of HPC
- The First Step in Network Co-design: Offloading
- Network Co-design as a Gateway to Exascale
- Co-design for Data Analytics And Machine Learning (this article)
If you prefer you can download the insideHPC Guide to Co-Design Architectures from the insideHPC White Paper Library.