Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


More Than Ever, Vectorization and Multithreading are Essential for Performance

Employing a hybrid of MPI across nodes in a cluster, multithreading with OpenMP* on each node, and vectorization of loops within each thread results in multiple performance gains. In fact, most application codes will run slower on the latest supercomputers if they run purely sequentially. This means that adding multithreading and vectorization to applications is now essential for running efficiently on the latest architectures.

3X Performance Boost Using Intel Advisor and Intel Trace Analyzer in Astrophysics Simulations

On today’s processors, it is crucial to both vectorize (using AVX* or SIMD* instructions) and parallelize software to realize the full performance potential of the processor. By optimizing their MHD astrophysics applications with tools from Intel Parallel Studio XE, and running on the latest Intel hardware, the NSU team achieved a performance speed-up of 3X, cutting the standard time for calculating one problem from one week to just two days.

Go with Intel® Data Analytics Acceleration Library and Go*

Use of the Go* programming language and it’s developer community has grown significantly since it’s official launch by Google in 2009. Like many popular programming languages (C and Java come to mind), Go started as an experiment to design a new programming language that would fix some of the common problems of other languages and yet stay true to the basic tenets of modern programming: be scalable, productive, readable, enable robust development environments, and support networking and multiprocessing.

Deep Learning Frameworks Get a Performance Benefit from Intel MKL Matrix-Matrix Multiplication

Intel® Math Kernel Library 2017 (Intel® MKL 2017) includes new GEMM kernels that are optimized for various skewed matrix sizes. The new kernels take advantage of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) and achieves high GEMM performance on multicore and many-core Intel® architectures, particularly for situations arising from deep neural networks..

Multicore Performance Challenges for Game Developers

Game developers face a unique challenge – how to make their graphics-heavy applications perform well across a very wide spectrum of hardware devices, not just high-end systems. So while an early version of a game might have been developed on some high-end system with 10 teraflops of CPU potential in a discrete graphics card, how do you scale it down to smaller consumer devices where optimization options are more limited?

The OpenMP API Celebrates 20 Years of Success

OpenMP is a good example of how hardware and software vendors, researchers, and academia, volunteering to work together, can successfully design a standard that benefits the entire developer community. Today, most software vendors track OpenMP advances closely and have implemented the latest API features in their compilers and tools. With OpenMP, application portability is assured across the latest multicore systems, including Intel Xeon Phi processors.

C++ Parallel STL Introduced in Intel Parallel Studio XE 2018 Beta

Parallel STL now makes it possible to transform existing sequential C++ code to take advantage of the threading and vectorization capabilities of modern hardware architectures. It does this by extending the C++ Standard Template Library with an execution policy argument that specifies the degree of threading and vectorization for each algorithm used.

Intel Advisor Roofline Analysis Finds New Opportunities for Optimizing Application Performance

Intel Advisor, an integral part of Intel Parallel Studio XE 2017, can help identify portions of code that could be good candidates for parallelization (both vectorization and threading). It can also help determine when it might not be appropriate to parallelize a section of code, depending on the platform, processor, and configuration it’s running on. Intel Advisor Roofline Analysis reveals the gap between an application’s performance and its expected performance.

Intel® VTune™ Amplifier Turns Raw Profiling Data Into Performance Insights

Discovering where the performance bottlenecks are and knowing what to do about it can be a mysterious and complex art, needing some very sophisticated performance analysis tools for success. That’s where Intel® VTune™ Amplifier XE 2017, part of Intel Parallel Studio XE, comes in.

Intel MKL and Intel TBB Working Together for Performance

When used in a TBB environment, Intel has demonstrated a many-fold performance improvement over the same parallelized code using Intel MKL in an OpenMP environment. Intel TBB-enabled Intel MKL is ideal when there is heavy threading in the Intel TBB application. Intel TBB-enabled Intel MKL shows solid performance improvements through better interoperability with other parts of the workload.