Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Articles and news on parallel programming and code modernization

Deep Learning Frameworks Get a Performance Benefit from Intel MKL Matrix-Matrix Multiplication

Intel® Math Kernel Library 2017 (Intel® MKL 2017) includes new GEMM kernels that are optimized for various skewed matrix sizes. The new kernels take advantage of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) and achieves high GEMM performance on multicore and many-core Intel® architectures, particularly for situations arising from deep neural networks..

Vectorization with AVX-512 Intrinsics

“With the Intel compilers, intrinsics are recognized and the instructions are generated in-line which is a tremendous advantage. Since the Intel Xeon Phi processor when using the AVX-512 intrinsics can perform a tremendous number of floating point operations per second, it is beneficial to use intrinsics for certain math computations. To use intrinsics, all that is needed is the proper header file and then to call the desired intrinsic function.”

Multicore Performance Challenges for Game Developers

Game developers face a unique challenge – how to make their graphics-heavy applications perform well across a very wide spectrum of hardware devices, not just high-end systems. So while an early version of a game might have been developed on some high-end system with 10 teraflops of CPU potential in a discrete graphics card, how do you scale it down to smaller consumer devices where optimization options are more limited?

Jump Start your Immersive Video Experiences

“With a wide range of Intel Xeon processors available, a truly immersive video experience can be delivered to a diverse set of end users who enjoy watching a concert or sporting event from wherever they may be located. While not an immersive VR experience, immersive video where the user has control over the viewing angle is a very important technology that can deliver exciting content to many consumers.”

The OpenMP API Celebrates 20 Years of Success

OpenMP is a good example of how hardware and software vendors, researchers, and academia, volunteering to work together, can successfully design a standard that benefits the entire developer community. Today, most software vendors track OpenMP advances closely and have implemented the latest API features in their compilers and tools. With OpenMP, application portability is assured across the latest multicore systems, including Intel Xeon Phi processors.

Transcoding for Optimal Video Consumption

Video streams may constructed using various standards, which contain information such as resolution, frame rate, color depth, etc. It is the job of the transcoder to take in one format and produce another format that would then be used downstream. While an application could be written that does the transformation, optimizing the application requires the expertise of the hardware manufacturer.

C++ Parallel STL Introduced in Intel Parallel Studio XE 2018 Beta

Parallel STL now makes it possible to transform existing sequential C++ code to take advantage of the threading and vectorization capabilities of modern hardware architectures. It does this by extending the C++ Standard Template Library with an execution policy argument that specifies the degree of threading and vectorization for each algorithm used.

Intel Processors for Machine Learning

Machine Learning is a hot topic for many industries and is showing tremendous promise to change how we use systems. From design and manufacturing to searching for cures for diseases, machine learning can be a great disrupter, when implemented to take advantage of the latest processors.

Intel Advisor Roofline Analysis Finds New Opportunities for Optimizing Application Performance

Intel Advisor, an integral part of Intel Parallel Studio XE 2017, can help identify portions of code that could be good candidates for parallelization (both vectorization and threading). It can also help determine when it might not be appropriate to parallelize a section of code, depending on the platform, processor, and configuration it’s running on. Intel Advisor Roofline Analysis reveals the gap between an application’s performance and its expected performance.

Intel® Graphics Performance Analyzer for Faster Graphics Performance

“Just as developers need tools to understand the performance of a CPU intensive application in order to modify the code for higher performance, so do those that develop interactive 3D computer graphics applications. An excellent tool for t this purpose is the Intel Graphics Performance Analyzer set. This tool, which is free to download, can help the developer understand at a very low level how the application is performing, from a number of aspects.”