Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Learn What to Do Next with Intel VTune Amplifier Application Performance Snapshot

Tuning code has, for a long time, been an art. Knowing what to look for and how to correct inefficiencies in serious numerical computations has not been easy for most programmers. It’s often hard to even know which tool to start with. Which is why the Intel® VTune™ Amplifier Application Performance Snapshot could prove to be a great way to get an instant summary of an application’s performance characteristics and issues.

Use Intel Media SDK to Build Cross-Platform High-Quality Video Workflows

The latest release of Intel® Media SDK offers a single, cross-platform, GPU-enabled API for building optimized media and video applications from PC’s to workstations and into the cloud.

Video: Speed Your Code with Intel Parallel Studio XE

“Modern processors perform their best with parallel code that’s both vectorized and threaded, which can run more than 100 times faster more than serial code. So how can you accomplish this more easily through parallel programming? Enter Parallel Studio XE, a suite of tools that simplifies and speeds the design, building, tuning, and scaling of applications with the latest code modernization methods.”

Intel AVX Gives Numerical Computations in Java a Big Boost

Recent Intel® enhancements to Java enable faster and better numerical computing. In particular, the Java Virtual Machine (JVM) now uses the Fused Multiply Add (FMA) instructions on Intel Intel Xeon® PhiTM processors with Advanced Vector Instructions (Intel AVX) to implement the Open JDK9 Math.fma()API. This gives significant performance improvements for matrix multiplications, the most basic computation found in most HPC, Machine Learning, and AI applications.

Performance Insights Using the Intel Advisor Python API

Tuning a complex application for today’s heterogeneous platforms requires an understanding of the application itself as well as familiarity with tools that are available for assisting with analyzing where in the code itself to look for bottlenecks.  The process for optimizing the performance of an application, in general, requires the following steps that are most likely applicable for a wide range of applications.

Vectorization Now More Important Than Ever

Vectorization, the hardware optimization technique synonymous with early vector supercomputers like the Cray-1 (1975), has reappeared with even greater importance than before. Today, 40+ years later, the AVX-512 vector instructions in the most recent many-core Intel Xeon and Intel® Xeon PhiTM processors can increase application performance by 16x for single-precision codes.

A New Way to Visualize Performance Optimization Tradeoffs

A valuable feature of Intel Advisor is its Roofline Analysis Chart, which provides an intuitive and powerful visualization of actual performance measured against hardware-imposed performance ceilings. Intel Advisor’s vector parallelism optimization analysis and memory-versus-compute roofline analysis, working together, offer a powerful tool for visualizing an application’s complete current and potential performance profile on a given platform.

Intel Parallel Studio XE 2018 Released

Intel has announced the release of Intel® Parallel Studio XE 2018, with updated compilers and developer tools. It is now available for downloading on a 30-day trial basis. ” This week’s formal release of the fully supported product is notable with new features that further enhance the toolset for accelerating HPC applications.”

TensorFlow Deep Learning Optimized for Modern Intel Architectures

Researchers at Google and Intel recently collaborated to extract the maximum performance from Intel® Xeon and Intel® Xeon Phi processors running TensorFlow*, a leading deep learning and machine learning framework. This effort resulted in significant performance gains and leads the way for ensuring similar gains from the next generation of products from Intel. Optimizing Deep Neural Network (DNN) models such as TensorFlow presents challenges not unlike those encountered with more traditional High Performance Computing applications for science and industry.

More Than Ever, Vectorization and Multithreading are Essential for Performance

Employing a hybrid of MPI across nodes in a cluster, multithreading with OpenMP* on each node, and vectorization of loops within each thread results in multiple performance gains. In fact, most application codes will run slower on the latest supercomputers if they run purely sequentially. This means that adding multithreading and vectorization to applications is now essential for running efficiently on the latest architectures.