Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Use Intel Media SDK to Build Cross-Platform High-Quality Video Workflows

The latest release of Intel® Media SDK offers a single, cross-platform, GPU-enabled API for building optimized media and video applications from PC’s to workstations and into the cloud.

Deep Learning Open Source Framework Optimized on Apache Spark*

Intel recently released BigDL. It’s an open source, highly optimized, distributed, deep learning framework for Apache Spark*. It makes Hadoop/Spark into a unified platform for data storage, data processing and mining, feature engineering, traditional machine learning, and deep learning workloads, resulting in better economy of scale, higher resource utilization, ease of use/development, and better TCO.

Maximizing Performance of HiFUN* CFD Solver on Intel® Xeon® Scalable Processor With Intel MPI Library

The HiFUN CFD solver shows that the latest-generation Intel Xeon Scalable processor enhances single-node performance due to the availability of large cache, higher core density per CPU, higher memory speed, and larger memory bandwidth. The higher core density improves intra-node parallel performance that permits users to build more compact clusters for a given number of processor cores. This permits the HiFUN solver to exploit better cache utilization that contributes to super-linear performance gained through the combination of a high-performance interconnect between nodes and the highly-optimized Intel® MPI Library.

Data Compression Optimized with Intel® Integrated Performance Primitives

Intel® Integrated Performance Primitives (Intel IPP) offers the developer a highly optimized, production-ready, library for lossless data compression/decompression that targets image, signal, and data processing, and cryptography applications. The Intel IPP optimized implementations of the common data compression algorithms are “drop-in” replacements for the original compression code.

Intel AVX Gives Numerical Computations in Java a Big Boost

Recent Intel® enhancements to Java enable faster and better numerical computing. In particular, the Java Virtual Machine (JVM) now uses the Fused Multiply Add (FMA) instructions on Intel Intel Xeon® PhiTM processors with Advanced Vector Instructions (Intel AVX) to implement the Open JDK9 Math.fma()API. This gives significant performance improvements for matrix multiplications, the most basic computation found in most HPC, Machine Learning, and AI applications.

Intel MKL Speeds Up Automated Driving Workloads on the Intel Xeon Processor

The automated driving developer community typically uses Eigen*, a C++ math library, for the matrix operations required by the Extended Kalman Filter algorithm. EKF usually involves many small matrices. However most HPC library routines for matrix operations are optimized for large matrices. “Intel MKL provides highly-tuned xGEMM function for matrix-matrix multiplication, with special paths for small matrices. Eigen can take advantage of Intel MKL through use of a compiler flag. A significant speedup results when using Eigen and Intel MKL and compiling the automated driving apps with the latest Intel C++ compiler.”

Intel MKL Compact Matrix Functions Attain Significant Speedups

The latest version of Intel® Math Kernel Library (MKL) offers vectorized compact functions for general and specialized matrix computations of this type. These functions rely on true SIMD (single instruction, multiple data) matrix computations, and provide significant performance benefits compared to traditional techniques that exploit multithreading but rely on standard data formats.

Vectorization Now More Important Than Ever

Vectorization, the hardware optimization technique synonymous with early vector supercomputers like the Cray-1 (1975), has reappeared with even greater importance than before. Today, 40+ years later, the AVX-512 vector instructions in the most recent many-core Intel Xeon and Intel® Xeon PhiTM processors can increase application performance by 16x for single-precision codes.

Intel MKL Speeds Up Small Matrix-Matrix Multiplication for Automatic Driving

Certain applications, such as automated driving, require low latency small matrix-matrix multiplication in real time. They use specialized libraries that can be customized for small matrix operations. Recompiling and linking those libraries with the highly optimized DGEMM routine in the Intel® Math Kernel Library 2018 can give speedups many times over native libraries.

Using the Intel C++ Compiler’s Optimization Features to Improve MySQL Performance

IT operations and maintenance developers have found that just by compiling the MySQL source code with the Intel C++ Compiler and turning on its Interprocedural Optimization feature, you can improve database performance from 5 to 35% compared with other compilers. “While there may be many factors affecting MySQL performance, such as hardware and software configuration, having a thoroughly optimized MySQL package is a good place to start.”