A team of researchers at Berkeley Lab, PNNL, and Intel are working hard to make sure that computational chemists are prepared to compute efficiently on next-generation exascale machines. Recently, they achieved a milestone, successfully adding thread-level parallelism on top of MPI-level parallelism in the planewave density functional theory method within the popular software suite NWChem. “Planewave codes are useful for solution chemistry and materials science; they allow us to look at the structure, coordination, reactions and thermodynamics of complex dynamical chemical processes in solutions and on surfaces.”
The European PRACE initiative has published a new Best Practice Guide for Intel Xeon Phi, Knights Landing Edition. “This best practice guide provides information about Intel’s MIC architecture and programming models for the Intel Xeon Phi co-processor in order to enable programmers to achieve good performance of their applications. The guide covers a wide range of topics from the description of the hardware of the Intel Xeon Phi co-processor through information about the basic programming models as well as information about porting programs up to tools and strategies how to analyze and improve the performance of applications.”
In this special guest feature, James Reinders looks at Intel Xeon Phi processors from a programmer’s perspective. “How does a programmer think of Intel Xeon Phi processors? In this brief article, I will convey how I, as a programmer, think of them. In subsequent articles, I will dive a bit more into details of various programming modes, and techniques employed for some key applications. In this article, I will endeavor to not stray into deep details – but rather offer an approachable perspective on how to think about programming for Intel Xeon Phi processors.”
Today the Barcelona Supercomputing Center announced plans to MareNostrum 4, a 13.7 Petaflop supercomputer that will be 12.4 times more powerful than the current MareNostrum 3 system. In a contract valued at almost €30 million, IBM will integrate in one sole machine using its own technologies alongside those of Lenovo, Intel, and Fujitsu.
While there have been previous generations of AVX instructions, the AVX-512 instructions can significantly assist the performance of HPC applications. “The new AVX-512 instructions have been designed with developers in mind. High level languages that are used for HPC applications, such as FORTRAN and C/C++, through a compiler will be able to use the new instructions. This can be accomplished through the use of pragmas to direct the compilers to generate the new instructions, or users can use libraries which are tuned to the new technology.”
“With up to 72 out-of-order cores, the new Intel Xeon Phi processor delivers over 3 teraFLOPS (floating-point operations per second) of double-precision peak while providing 3.5 times higher performance per watt than the previous generation. As a bootable CPU with integrated architecture, the Intel Xeon Phi processor eliminates PCIe* bottlenecks, includes on-package high-bandwidth memory, and available integrated Intel Omni-Path fabric architecture to deliver fast, low-latency performance.”
Adrian Jackson from EPCC at the University of Edinburgh presented this tutorial to ARCHER users. “We have been working for a number of years on porting computational simulation applications to the KNC, with varying successes. We were keen to test this new processor with its promise of 3x serial performance compared to the KNC and 5x memory bandwidth over normal processors (using the high-bandwidth, MCDRAM, memory attached to the chip).”
LANL reports that a moment of inspiration during a wiring diagram review has saved more than $2 million in material and labor costs for the Trinity supercomputer at Los Alamos National Laboratory.
Kyoto University Thinks Widening SIMD Will be Key to Performance Gains in New Intel Xeon Phi processor-based Cray System
“With an imminent switchover to a new Cray system with next-generation Intel Xeon Phi Processors (codenamed Knights Landing) planned for October, the ACCMS team at Kyoto University is eagerly looking forward to a potential two-fold application performance improvements from its new system. But the lab is also well aware that there is significant recoding work ahead before the promise of the new manycore technology can be realized.”
“Intel provided a wealth of machine learning announcements following the Intel Xeon Phi processor (formerly known as Knights Landing) announcement at ISC’16. Building upon the various technologies in Intel Scalable System Framework, the machine learning community can expect up to 38% better scaling over GPU-accelerated machine learning and an up to 50x speedup when using 128 Intel Xeon Phi nodes compared to a single Intel Xeon Phi node. The company also announced an up to 30x improvement in inference performance (also known as scoring or prediction) on the Intel Xeon E5 product family due to an optimized Intel Caffe plus Intel Math Kernel Library (Intel® MKL) package.”