“With up to 72 processing cores, the Intel Xeon Phi processor x200 can accelerate applications tremendously. Each core contains two Advanced Vector Extensions, which speeds up the floating point performance. This is important for machine learning applications which in many cases use the Fused Multiply-Add (FMA) instruction.”
Norbert Eicker from the Jülich Supercomputing Centre presented this talk at the SAI Computing Conference in London. “The ultimate goal is to reduce the burden on the application developers. To this end DEEP/-ER provides a well-accustomed programming environment that saves application developers from some of the tedious and often costly code modernization work. Confining this work to code-annotation as proposed by DEEP/-ER is a major advancement.”
“The major functionality of the Intel Xeon Phi coprocessor is a chip that does the heavy computation. The current version utilizes up to 16 channels of GDDR5 memory. An interesting notes is that up to 32 memory devices can be used, by using both sides of the motherboard to hold the memory. This doubles the effective memory availability as compared to more conventional designs.”
“High performance systems now typically a host processor and a coprocessor. The role of the coprocessor is to provide the developer and the user the ability to significantly speed up simulations if the algorithm that is used can run with a high degree of parallelization and can take advantage of an SIMD architecture. The Intel Xeon Phi coprocessor is an example of a coprocessor that is used in many HPC systems today.”
The ability to develop applications independent of the hardware availability at run time is a very important concept that enables developers to take advantage of the latest and greatest processing and coprocessing power. Without having to make run time checks on hardware availability is critical to a smooth running HPC environment.
“Native execution is good for application that are performing operations that map to parallelism either in threads or vectors. However, running natively on the coprocessor is not ideal when the application must do a lot of I/O or runs large parts of the application in a serial mode. Offloading has its own issues. Asynchronous allocation, copies, and the deallocation of data can be performed but it complex. Another challenge with offloading is that it requires memory blocking. Overall, it is important to understand the application, the workflow within the application and how to use the Intel Xeon Phi coprocessor most effectively.”
For decades, Intel has been enabling insight and discovery through its technologies and contributions to parallel computing and High Performance Computing (HPC). Central to the company’s most recent work in HPC is a new design philosophy for clusters and supercomputers called Intel® Scalable System Framework (Intel® SSF), an approach designed to enable sustained, balanced performance as the community pushes towards the Exascale era.
In this video from ISC 2016, Steve Branton from Asetek describes the company’s innovative liquid cooling solutions for HPC. “Because liquid is 4,000 times better at storing and transferring heat than air, Asetek’s solutions provide immediate and measurable benefits to large and small data centers alike. RackCDU D2C is a “free cooling” solution that captures between 60% and 80% of server heat, reducing data center cooling cost by over 50% and allowing 2.5x-5x increases in data center server density. D2C removes heat from CPUs, GPUs, memory modules within servers using water as hot as 40°C (105°F), eliminating the need for chilling to cool these components.”
In this video from ISC 2016, Bill Bryce from Univa describes the company’s innovative container technology helps customers manage their computing workloads with Univa Grid Engine. “Grid Engine 8.4.0 has many significant updates including Docker support and integration with the new Intel Xeon Phi processor,” said Bill Bryce, Vice President of Products at Univa. “This latest release will allow a user or administrator to schedule jobs so that the right business-critical jobs are prioritized over other workloads-thus maximizing shared resources and allowing Univa customers to gain velocity.”
“Intel provided a wealth of machine learning announcements following the Intel Xeon Phi processor (formerly known as Knights Landing) announcement at ISC’16. Building upon the various technologies in Intel Scalable System Framework, the machine learning community can expect up to 38% better scaling over GPU-accelerated machine learning and an up to 50x speedup when using 128 Intel Xeon Phi nodes compared to a single Intel Xeon Phi node. The company also announced an up to 30x improvement in inference performance (also known as scoring or prediction) on the Intel Xeon E5 product family due to an optimized Intel Caffe plus Intel Math Kernel Library (Intel® MKL) package.”