“With up to 72 out-of-order cores, the new Intel Xeon Phi processor delivers over 3 teraFLOPS (floating-point operations per second) of double-precision peak while providing 3.5 times higher performance per watt than the previous generation. As a bootable CPU with integrated architecture, the Intel Xeon Phi processor eliminates PCIe* bottlenecks, includes on-package high-bandwidth memory, and available integrated Intel Omni-Path fabric architecture to deliver fast, low-latency performance.”
With the advent of the tremendous compute density of new processors, it is important to understand if an application can take advantage of multicore. “Developers should understand if an application might be ready to run in a highly vectorized or many core environment before attempting to do the work necessary to obtain the high performance that might be expected.”
“Being ready with full support for Intel Xeon Phi from day one has been a key strategy for Allinea and underpins our approach for supporting customers, such as Los Alamos National Laboratory on the Trinity system, Argonne National Laboratory on Theta and NERSC on Cori, where work is now underway to port code and get applications ready for more complex science on a larger scale.”
Adrian Jackson from EPCC at the University of Edinburgh presented this tutorial to ARCHER users. “We have been working for a number of years on porting computational simulation applications to the KNC, with varying successes. We were keen to test this new processor with its promise of 3x serial performance compared to the KNC and 5x memory bandwidth over normal processors (using the high-bandwidth, MCDRAM, memory attached to the chip).”
With the introduction of the Intel Scalable System Framework, the Intel Xeon Phi processor can speed up Finite Element Analysis significantly. Using highly tuned math libraries such as the Intel Math Kernel Library (Intel MKL), FEA applications can execute math routines in parallel on the Intel Xeon Phi processor.
“Deep learning developers and researchers want to train neural networks as fast as possible. Right now we are limited by computing performance,” said Dr. Diamos. “The first step in improving performance is to measure it, so we created DeepBench and are opening it up to the deep learning community. We believe that tracking performance on different hardware platforms will help processor designers better optimize their hardware for deep learning applications.”
The National Computational Infrastructure in Canberra, Australia’s national advanced computing facility, is the first Australian institution to deploy the latest generation of Intel Xeon Phi processors, formerly code named Knights Landing. “NCI is leading efforts in the scientific community to tune applications for Intel Xeon Phi processors,” explains Dr Muhammad Atif, NCI’s HPC Systems and Cloud Services Manager. “We have identified a large number of applications that will benefit from this hardware and software paradigm, including those applications in the domains of computational physics, computational chemistry and climate research.”
“Fortran has been proven to be extremely resilient to new developments that have appeared in other programming languages over the years. New versions continue to be available and associated with ANSI standards, so that an application written for one operating system should be able to be compiled and run with different compilers on different operating systems. The latest version is Fortran 2008, with the next version reportedly to be available as Fortran 2015, in 2018.”
Vectorization and threading are critical to using such innovative hardware product such as the Intel Xeon Phi processor. Using tools early in the design and development processor that identify where vectorization can be used or improved will lead to increased performance of the overall application. Modern tools can be used to determine what might be blocking compiler vectorization and the potential gain from the work involved.
“An environment that assists in deep learning usually consists of algorithms that can draw conclusions from data that is run at very high speeds. Processors such as the Intel Xeon Phi Processor that contain a significant number of processing cores and operate in a SIMD mode are critical to these new environments. With the Intel Xeon Phi processor, new insights can be discovered from either existing data or new data sources.”