Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Intel Advisor Roofline Analysis Finds New Opportunities for Optimizing Application Performance

Intel Advisor, an integral part of Intel Parallel Studio XE 2017, can help identify portions of code that could be good candidates for parallelization (both vectorization and threading). It can also help determine when it might not be appropriate to parallelize a section of code, depending on the platform, processor, and configuration it’s running on. Intel Advisor Roofline Analysis reveals the gap between an application’s performance and its expected performance.

Six Steps Towards Better Performance on Intel Xeon Phi

“As with all new technology, developers will have to create processes in order to modernize applications to take advantage of any new feature. Rather than randomly trying to improve the performance of an application, it is wise to be very familiar with the application and use available tools to understand bottlenecks and look for areas of improvement.”

Video: Modern Code – Making the Impossible Possible

In this video, Rich Brueckner from insideHPC moderates a panel discussion on Code Modernization. “SC15 luminary panelists reflect on collaboration with Intel and how building on hardware and software standards facilitates performance on parallel platforms with greater ease and productivity. By sharing their experiences modernizing code we hope to shed light on what you might see from modernizing your own code.”

IA Optimized Python Rocks in Production

“Intel recently announced the first product release of its High Performance Python distribution powered by Anaconda. The product provides a prebuilt easy-to-install Intel Architecture (IA) optimized Python for numerical and scientific computing, data analytics, HPC and more. It’s a free, drop in replacement for existing Python distributions that requires no changes to Python code. Yet benchmarks show big Intel Xeon processor performance improvements and even bigger Intel Xeon Phi processor performance improvements.”

Video: Intel Scalable System Framework

Gary Paek from Intel presented this talk at the HPC User Forum in Austin. “Traditional high performance computing is hitting a performance wall. With data volumes exploding and workloads becoming increasingly complex, the need for a breakthrough in HPC performance is clear. Intel Scalable System Framework provides that breakthrough. Designed to work for small clusters to the world’s largest supercomputers, Intel SSF provides scalability and balance for both compute- and data intensive applications, as well as machine learning and visualization. The design moves everything closer to the processor to improve bandwidth, reduce latency and allow you to spend more time processing and less time waiting.”

Facilitate HPC Deployments with Reference Designs for Intel Scalable System Framework

With Intel Scalable System Framework Architecture Specification and Reference Designs, the company is making it easier to accelerate the time to discovery through high-performance computing. The Reference Architectures (RAs) and Reference Designs take Intel Scalable System Framework to the next step—deploying it in ways that will allow users to confidently run their workloads and allow system builders to innovate and differentiate designs

Shared Memory and MPI 3.0

As multi-socket, then multi-core systems have become the standard, the Message Passing Interface (MPI) has become one of the most popular programming models for applications that can run in parallel using many sockets and cores. Shared memory programming interfaces, such as OpenMP, have allowed developers to take advantage of systems that combine many individual servers and shared memory within the server itself. However, two different programming models have been used at the same time. The MPI 3.0 standard allows for a new MPI interprocess shared memory extension (MPI SHM).

Arithmetic Intensity of Stencil Operations

Applications that use 3D Finite Difference (3DFD) calculations are numerically intensive and can be optimized quite heavily to take advantage of accelerators that are available in today’s systems. The performance of an implementation can and should be optimized using numerical stencils. Choices made when designing and implementing algorithms can affect the Arithmetic Intensity (AI), which is a measure of how efficient an implementation, by comparing the flops and memory access.

Intel Updates Developer Toolkit with Data Analytics Acceleration Library

Today Intel released Intel Parallel Studio XE 2016, the next iteration of its developer toolkit for HPC and technical computing applications. This release introduces the Intel Data Analytics Acceleration Library, a library for big data developers that turns large data clusters into meaningful information with advanced analytics algorithms.

Video: Intel Vector Advisor Unlocks Code Performance

In this video, Rick Leinecker from Slashdot Media describes the Vectorization Advisor, one of the new additions to Intel Parallel Studio XE suite. “Vectorization Advisor is an analysis tool that lets you identify if loops utilize modern SIMD instructions or not, what prevents vectorization, what is performance efficiency and how to increase it. Vectorization Advisor shows compiler optimization reports in user-friendly way, and extends them with multiple other metrics, like loop trip counts, CPU time, memory access patterns and recommendations for optimization.”