Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Data Compression Optimized with Intel® Integrated Performance Primitives

Intel® Integrated Performance Primitives (Intel IPP) offers the developer a highly optimized, production-ready, library for lossless data compression/decompression that targets image, signal, and data processing, and cryptography applications. The Intel IPP optimized implementations of the common data compression algorithms are “drop-in” replacements for the original compression code.

Intel MKL Compact Matrix Functions Attain Significant Speedups

The latest version of Intel® Math Kernel Library (MKL) offers vectorized compact functions for general and specialized matrix computations of this type. These functions rely on true SIMD (single instruction, multiple data) matrix computations, and provide significant performance benefits compared to traditional techniques that exploit multithreading but rely on standard data formats.

Unlocking the Power of Parallel Coding to Access Better Performance in Multi-Core Environments

A number of different frameworks and standards can be employed for parallel coding. The choice of the most suitable depends on the purpose of the application, its overall requirements and the target execution environment. Selecting the right framework is imperative to obtaining the best possible performance increase. The choice of framework is based on the available memory, overheads, controls and support.

Diagnose Cluster Health with Intel® Cluster Checker

Intel® Cluster Checker, distributed as part of Intel® Parallel Studio XE 2018 Cluster Edition, provides a set of system diagnostics and analysis methods in a single tool to assist managing clusters of any size. “Think of Intel Cluster Checker as a clinical system that detects signs that issues affecting the health of the cluster exist, diagnoses those issues, and suggests remedies. Using common diagnostic tools signs that may indicate symptoms leading to a diagnosis and a possible solution.”

More Than Ever, Vectorization and Multithreading are Essential for Performance

Employing a hybrid of MPI across nodes in a cluster, multithreading with OpenMP* on each node, and vectorization of loops within each thread results in multiple performance gains. In fact, most application codes will run slower on the latest supercomputers if they run purely sequentially. This means that adding multithreading and vectorization to applications is now essential for running efficiently on the latest architectures.

3X Performance Boost Using Intel Advisor and Intel Trace Analyzer in Astrophysics Simulations

On today’s processors, it is crucial to both vectorize (using AVX* or SIMD* instructions) and parallelize software to realize the full performance potential of the processor. By optimizing their MHD astrophysics applications with tools from Intel Parallel Studio XE, and running on the latest Intel hardware, the NSU team achieved a performance speed-up of 3X, cutting the standard time for calculating one problem from one week to just two days.

Introduction to Parallel Programming with OpenACC

“This is the first in a series of short videos to introduce you to parallel programming with OpenACC and the PGI compilers, using C++ or Fortran. You will learn by example how to build a simple example program, how to add OpenACC directives, and to rebuild the program for parallel execution on a multicore system. To get the most out of this video, you should download the example programs and follow along on your workstation.”

Minimal Metrics Releases PerfMiner Parallel Optimization Tool

This week Minimal Metrics announced an early-adopter program for PerfMiner, which uses lightweight, and pervasive performance data collection technology, automates its collection, and mines the data for key performance indicators. These indicators were developed through Minimal Metrics’ extensive experience tuning HPC and enterprise application performance, presented in an audience-specific, drill-down hierarchy that provides accountability for site productivity down to the performance of individual application threads.

PRIMEHPC FX10 Fujitsu Supercomputer

Fujitsu developed the first Japanese supercomputer in 1977. In the thirty-plus years since then, we have been leading the development of supercomputers with the application of advanced technologies. We now introduce the PRIMEHPC FX10, a state-of-the-art supercomputer that makes the petascale computing achieved by the “K computer”(*1) more accessible.

SAS Analytics Using Direct Memory Access

Using Remote Direct Memory Access based analytics and fast, scalable,external disk systems with massively parallel access to data, SAS analytics driven organizations can deliver timely and accurate execution for data intensive workflows such as risk management, while incorporating larger datasets than using traditional NAS.