Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


The Internet of Things and Tuning

“Understanding how the pipeline slots are being utilized can greatly increase the performance of the application. If pipeline slots are blocked for some reason, performance will suffer. Likewise, getting an understanding of the various cache misses can lead to a better organization of the data. This can increase performance while reducing latencies of memory to CPU.”

The OpenMP API Celebrates 20 Years of Success

OpenMP is a good example of how hardware and software vendors, researchers, and academia, volunteering to work together, can successfully design a standard that benefits the entire developer community. Today, most software vendors track OpenMP advances closely and have implemented the latest API features in their compilers and tools. With OpenMP, application portability is assured across the latest multicore systems, including Intel Xeon Phi processors.

C++ Parallel STL Introduced in Intel Parallel Studio XE 2018 Beta

Parallel STL now makes it possible to transform existing sequential C++ code to take advantage of the threading and vectorization capabilities of modern hardware architectures. It does this by extending the C++ Standard Template Library with an execution policy argument that specifies the degree of threading and vectorization for each algorithm used.

Intel Advisor Roofline Analysis Finds New Opportunities for Optimizing Application Performance

Intel Advisor, an integral part of Intel Parallel Studio XE 2017, can help identify portions of code that could be good candidates for parallelization (both vectorization and threading). It can also help determine when it might not be appropriate to parallelize a section of code, depending on the platform, processor, and configuration it’s running on. Intel Advisor Roofline Analysis reveals the gap between an application’s performance and its expected performance.

Six Steps Towards Better Performance on Intel Xeon Phi

“As with all new technology, developers will have to create processes in order to modernize applications to take advantage of any new feature. Rather than randomly trying to improve the performance of an application, it is wise to be very familiar with the application and use available tools to understand bottlenecks and look for areas of improvement.”

Video: Modern Code – Making the Impossible Possible

In this video, Rich Brueckner from insideHPC moderates a panel discussion on Code Modernization. “SC15 luminary panelists reflect on collaboration with Intel and how building on hardware and software standards facilitates performance on parallel platforms with greater ease and productivity. By sharing their experiences modernizing code we hope to shed light on what you might see from modernizing your own code.”

IA Optimized Python Rocks in Production

“Intel recently announced the first product release of its High Performance Python distribution powered by Anaconda. The product provides a prebuilt easy-to-install Intel Architecture (IA) optimized Python for numerical and scientific computing, data analytics, HPC and more. It’s a free, drop in replacement for existing Python distributions that requires no changes to Python code. Yet benchmarks show big Intel Xeon processor performance improvements and even bigger Intel Xeon Phi processor performance improvements.”

Video: Intel Scalable System Framework

Gary Paek from Intel presented this talk at the HPC User Forum in Austin. “Traditional high performance computing is hitting a performance wall. With data volumes exploding and workloads becoming increasingly complex, the need for a breakthrough in HPC performance is clear. Intel Scalable System Framework provides that breakthrough. Designed to work for small clusters to the world’s largest supercomputers, Intel SSF provides scalability and balance for both compute- and data intensive applications, as well as machine learning and visualization. The design moves everything closer to the processor to improve bandwidth, reduce latency and allow you to spend more time processing and less time waiting.”

Facilitate HPC Deployments with Reference Designs for Intel Scalable System Framework

With Intel Scalable System Framework Architecture Specification and Reference Designs, the company is making it easier to accelerate the time to discovery through high-performance computing. The Reference Architectures (RAs) and Reference Designs take Intel Scalable System Framework to the next step—deploying it in ways that will allow users to confidently run their workloads and allow system builders to innovate and differentiate designs

Shared Memory and MPI 3.0

As multi-socket, then multi-core systems have become the standard, the Message Passing Interface (MPI) has become one of the most popular programming models for applications that can run in parallel using many sockets and cores. Shared memory programming interfaces, such as OpenMP, have allowed developers to take advantage of systems that combine many individual servers and shared memory within the server itself. However, two different programming models have been used at the same time. The MPI 3.0 standard allows for a new MPI interprocess shared memory extension (MPI SHM).