Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Intel Xeon Phi Coprocessor Architecture

“High performance systems now typically a host processor and a coprocessor. The role of the coprocessor is to provide the developer and the user the ability to significantly speed up simulations if the algorithm that is used can run with a high degree of parallelization and can take advantage of an SIMD architecture. The Intel Xeon Phi coprocessor is an example of a coprocessor that is used in many HPC systems today.”

Using Libraries in Offload Mode

The ability to develop applications independent of the hardware availability at run time is a very important concept that enables developers to take advantage of the latest and greatest processing and coprocessing power. Without having to make run time checks on hardware availability is critical to a smooth running HPC environment.

Offloading vs Native Execution on Intel Xeon Phi Coprocessors

“Native execution is good for application that are performing operations that map to parallelism either in threads or vectors. However, running natively on the coprocessor is not ideal when the application must do a lot of I/O or runs large parts of the application in a serial mode. Offloading has its own issues. Asynchronous allocation, copies, and the deallocation of data can be performed but it complex. Another challenge with offloading is that it requires memory blocking. Overall, it is important to understand the application, the workflow within the application and how to use the Intel Xeon Phi coprocessor most effectively.”

Intel® Xeon Phi™ Processor—Highly Parallel Computing Engine for HPC

For decades, Intel has been enabling insight and discovery through its technologies and contributions to parallel computing and High Performance Computing (HPC). Central to the company’s most recent work in HPC is a new design philosophy for clusters and supercomputers called Intel® Scalable System Framework (Intel® SSF), an approach designed to enable sustained, balanced performance as the community pushes towards the Exascale era.

Video: Asetek Showcases Liquid Cooling at ISC 2016

In this video from ISC 2016, Steve Branton from Asetek describes the company’s innovative liquid cooling solutions for HPC. “Because liquid is 4,000 times better at storing and transferring heat than air, Asetek’s solutions provide immediate and measurable benefits to large and small data centers alike. RackCDU D2C is a “free cooling” solution that captures between 60% and 80% of server heat, reducing data center cooling cost by over 50% and allowing 2.5x-5x increases in data center server density. D2C removes heat from CPUs, GPUs, memory modules within servers using water as hot as 40°C (105°F), eliminating the need for chilling to cool these components.”

Video: Univa Grid Engine Speeds Workloads with Intel Xeon Phi Processor

In this video from ISC 2016, Bill Bryce from Univa describes the company’s innovative container technology helps customers manage their computing workloads with Univa Grid Engine. “Grid Engine 8.4.0 has many significant updates including Docker support and integration with the new Intel Xeon Phi processor,” said Bill Bryce, Vice President of Products at Univa. “This latest release will allow a user or administrator to schedule jobs so that the right business-critical jobs are prioritized over other workloads-thus maximizing shared resources and allowing Univa customers to gain velocity.”

Intel Furthers Machine Learning Capabilities

“Intel provided a wealth of machine learning announcements following the Intel Xeon Phi processor (formerly known as Knights Landing) announcement at ISC’16. Building upon the various technologies in Intel Scalable System Framework, the machine learning community can expect up to 38% better scaling over GPU-accelerated machine learning and an up to 50x speedup when using 128 Intel Xeon Phi nodes compared to a single Intel Xeon Phi node. The company also announced an up to 30x improvement in inference performance (also known as scoring or prediction) on the Intel Xeon E5 product family due to an optimized Intel Caffe plus Intel Math Kernel Library (Intel® MKL) package.”

Thomas Sterling presents: HPC Achievement and Impact 2016

Thomas Sterling presented this keynote at ISC 2016 in Frankfurt. “Even as the hundred petaflops era is coming within sight, more dramatic programs to achieve exaflops capacity are now emerging with the expectation of this two orders of magnitude advance in the early part of the next decade. Yet the challenges of the end of Moore’s Law loom ever greater, threatening to impede further progress. Innovations in semiconductor technologies and processor socket architecture matched with application development environments improvements promise to overcome such barriers. This keynote presentation will deliver a rapid-fire summary of the major accomplishments of the last year that promises a renaissance in supercomputing in the immediate future.”

Intel Xeon Phi Developer Access Platform at ISC 2016

In this video from ISC 2016, Kirti Devi from Intel describes the new Intel Developer Platform for the Intel Xeon Phi processor. “With this program, Intel and its partner Colfax are widening early levels of access, support and training for the widely anticipated next-generation Intel Xeon Phi processor release. The Developer Access Program gives developers the opportunity to begin leveraging key new capabilities in the processor before they are generally available. That means developers will have time to work to parallelize and vectorize their code and look for opportunities to exploit the massive performance capabilities that KNL offers so workloads are ready for prime time when customers deploy their next-generation systems.”

Programming Many Tasks for Many Cores

“Tasks keep the CPUs busy. When a core is working, rather than waiting for work to be sent to it, the application progresses towards it conclusion. A caveat to all of this is to remember that tasking and threading models remain on the system it was created on. Tasks that use a shared memory space only work within the shared memory segment that the processing cores can get to. Shared memory on the CPU side of the system is separate from the shared memory on the coprocessor. The threads created will remain on the part of the system where it started.”