Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Diagnose Cluster Health with Intel® Cluster Checker

Intel® Cluster Checker, distributed as part of Intel® Parallel Studio XE 2018 Cluster Edition, provides a set of system diagnostics and analysis methods in a single tool to assist managing clusters of any size. “Think of Intel Cluster Checker as a clinical system that detects signs that issues affecting the health of the cluster exist, diagnoses those issues, and suggests remedies. Using common diagnostic tools signs that may indicate symptoms leading to a diagnosis and a possible solution.”

Parallel Applications Speed Up Manufacturing Product Development

The product design process has undergone a significant transformation with the availability of supercomputing power at traditional workstation prices. With over 100 threads available to an application in compact 2 socket servers, scalability of applications that are used as part of the product design and development process are just a keyboard away for a wide range of engineers.

OpenMP at 20 Moving Forward to 5.0

This year, OpenMP*, the widely used API for shared memory parallelism supported in many C/C++ and Fortran compilers, turns 20. OpenMP is a great example of how hardware and software vendors, researchers, and academia, volunteering to work together, can successfully design a specification that benefits the entire developer community.

Intel Parallel Studio XE 2018 For Demanding HPC Applications

“For those that develop HPC applications, there are usually two main areas that must be considered. The first is the translation of the algorithm, whether simulation based, physics based or pure research into the code that a modern computer system can run. A second challenge is how to move from the implementation of an algorithm to the performance that takes advantage of modern CPUs and accelerators.”

Intel Parallel Studio XE 2018 Released

Intel has announced the release of Intel® Parallel Studio XE 2018, with updated compilers and developer tools. It is now available for downloading on a 30-day trial basis. ” This week’s formal release of the fully supported product is notable with new features that further enhance the toolset for accelerating HPC applications.”

The Internet of Things and Tuning

“Understanding how the pipeline slots are being utilized can greatly increase the performance of the application. If pipeline slots are blocked for some reason, performance will suffer. Likewise, getting an understanding of the various cache misses can lead to a better organization of the data. This can increase performance while reducing latencies of memory to CPU.”

TensorFlow Deep Learning Optimized for Modern Intel Architectures

Researchers at Google and Intel recently collaborated to extract the maximum performance from Intel® Xeon and Intel® Xeon Phi processors running TensorFlow*, a leading deep learning and machine learning framework. This effort resulted in significant performance gains and leads the way for ensuring similar gains from the next generation of products from Intel. Optimizing Deep Neural Network (DNN) models such as TensorFlow presents challenges not unlike those encountered with more traditional High Performance Computing applications for science and industry.

Internode Programming With MPI and Intel Xeon Phi Processor

“While MPI was originally developed for general purpose CPUs and is widely used in the HPC space in this capacity, MPI applications can also be developed and then deployed with the Intel Xeon Phi Processor. With the understanding of the algorithms that are used for a specific application, tremendous performance can be achieved by using a combination of OpenMP and MPI.”

More Than Ever, Vectorization and Multithreading are Essential for Performance

Employing a hybrid of MPI across nodes in a cluster, multithreading with OpenMP* on each node, and vectorization of loops within each thread results in multiple performance gains. In fact, most application codes will run slower on the latest supercomputers if they run purely sequentially. This means that adding multithreading and vectorization to applications is now essential for running efficiently on the latest architectures.

Feed The Cores – Memory Bandwidth Usage

“Memory bandwidth to the CPUs has always been important. There were typically CPU cores that would wait for the data (if not in cache) from main memory. However, with the advanced capabilities of the Intel Xeon Phi processor, there are new concepts to understand and take advantage of.”