Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Vectorization Now More Important Than Ever

Vectorization, the hardware optimization technique synonymous with early vector supercomputers like the Cray-1 (1975), has reappeared with even greater importance than before. Today, 40+ years later, the AVX-512 vector instructions in the most recent many-core Intel Xeon and Intel® Xeon PhiTM processors can increase application performance by 16x for single-precision codes.

Intel MKL Speeds Up Small Matrix-Matrix Multiplication for Automatic Driving

Certain applications, such as automated driving, require low latency small matrix-matrix multiplication in real time. They use specialized libraries that can be customized for small matrix operations. Recompiling and linking those libraries with the highly optimized DGEMM routine in the Intel® Math Kernel Library 2018 can give speedups many times over native libraries.

Using the Intel C++ Compiler’s Optimization Features to Improve MySQL Performance

IT operations and maintenance developers have found that just by compiling the MySQL source code with the Intel C++ Compiler and turning on its Interprocedural Optimization feature, you can improve database performance from 5 to 35% compared with other compilers. “While there may be many factors affecting MySQL performance, such as hardware and software configuration, having a thoroughly optimized MySQL package is a good place to start.”

Use Intel® Inspector to Diagnose Hidden Memory and Threading Errors in Parallel Code

Intel Inspector is an integrated debugger that can easily diagnose latent and intermittent errors and guide users to locate the root cause. It does this by instrumenting the binaries, including dynamically generated or linked libraries, even when the source code is not available. This includes C, C++, and legacy Fortran codes.

Intel Advisor’s TBB Flow Graph Analyzer: Making Complex Layers of Parallelism More Manageable

Some deep learning applications tend to have very complex graphs with thousands of nodes and edges. To make it easier to visualize, analyze, design, and tune such complex parallel applications employing Intel TBB flow graphs, Intel provides Intel Advisor Flow Graph Analyzer (Intel FGA). It gives developers a comprehensive set of tools to examine, debug, and analyze Intel TBB flow graphs.

A New Way to Visualize Performance Optimization Tradeoffs

A valuable feature of Intel Advisor is its Roofline Analysis Chart, which provides an intuitive and powerful visualization of actual performance measured against hardware-imposed performance ceilings. Intel Advisor’s vector parallelism optimization analysis and memory-versus-compute roofline analysis, working together, offer a powerful tool for visualizing an application’s complete current and potential performance profile on a given platform.

Exploring the OpenHPC Solution for HPC

To meet some of the biggest HPC software challenges, representatives from more than 25 academic, research, and commercial organizations formed a community project at the end of 2015: OpenHPC. This is the third article in a four-part series that explores going beyond OpenHPC with Intel HPC Orchestrator. Download the full insideHPC Special Report.

Building Fast Data Compression Code with Intel Integrated Performance Primitives (Intel IPP) 2018

Intel® Integrated Performance Primitives (Intel IPP) is a highly optimized, production-ready, library for lossless data compression/decompression targeting image, signal, and data processing, and cryptography applications. Intel IPP includes more than 2,500 image processing, 1,300 signal processing, 500 computer vision, and 300 cryptography optimized functions for creating digital media, enterprise data, embedded, communications, and scientific, technical, and security applications.

Intel Compilers 18.0 Tune for AVX-512 ISA Extensions

Intel Compilers 18.0 and Intel Parallel Studio XE 2018 tuning software fully support the AVX-512 instructions. By widening and deepening the vector registers, the new instructions and added enhancements let the compiler squeeze more vector parallelism out of applications than before. Applications compiled with the –xCORE-AVX512 will generate an executable that utilizes these new high-performance instructions.

Challenges to Managing an HPC Software Stack

The HPC system software stack tends to be complicated, assembled out of a diverse mix of somewhat compatible open source and commercial components. This is the second article in a four-part series that explores using Intel HPC Orchestrator to solve HPC software stack management challenges. Download the full insideHPC Special Report.