In this video from the 2016 HPC Advisory Council Switzerland Conference, Elijah Charles from Intel presents: Using Xeon + FPGA for Accelerating HPC Workloads.
“The Exascale computing challenge is the current Holy Grail for high performance computing. It envisages building HPC systems capable of 10^18 floating point operations under a power input in the range of 20-40 MW. To achieve this feat, several barriers need to be overcome. These barriers or “walls” are not completely independent of each other, but present a lens through which HPC system design can be viewed as a whole, and its composing sub-systems optimized to overcome the persistent bottlenecks.”
In summary, these walls are:
- Technology Scaling
- Compute or Instruction Execution
- Memory Bandwidth
- Network/Interconnect Bandwidth
- Power/Energy consumed and dissipated
- Utilization aka Dark Silicon
- Programmability. Ultimately all computer architectures follow a high level organization in which compute, memory and interconnect are organized and compute operations are sequenced.
The current bellwether – von Neumann architecture (VNA) has so far sustained the user requirements but would it serve requirements at Exa-scale has been a defining question for some time. This has led to research in alternatives to VNA, which diverge from it. This divergence is across a spectrum of architectural trade-offs which augment VNA, for example, heterogeneous architectures with GPGPU and FPGA. At the other end of the spectrum are completely new architectures such as cognitive/neural models and spatial computing. In this talk, we will look into the above barriers and what the industry and academia are doing to overcome them. As an example, to address memory bandwidth challenge, new memory architectures such as the Hybrid Memory Cube have been proposed which dramatically reduces power consumption compared the conventional DDR3/DDR4 memories while providing high bandwidth. Also, we will look into upcoming Xeon+FPGA architecture introduces the new trade-off of re-configurable hardware and complexity associated with it with the benefits of higher performance per watt.