Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Articles and news on parallel programming and code modernization

Multiple Endpoints in the Latest Intel MPI Library Boosts Hybrid Performance

The performance of distributed memory MPI applications on the latest highly parallel multi-core processors often turns out to be lower than expected. Which is why hybrid applications using OpenMP multithreading on each node and MPI across nodes in a cluster are becoming more common. This sponsored post from Intel, written by Richard Friedman, depicts how to boost performance for hybrid applications with multiple endpoints in the Intel MPI Library. 

PASC19 Evolves into an International Conference on Computational Science

In this video from PASC19, Torsten Hoefler from ETH Zurich describes how PASC19 has grown into an international conference with over 60 percent of attendees from outside Switzerland. After that, he describes a new groundbreaking programming model his team is developing that centers around the minimization of data movement for computation.

Achieving Parallelism in Intel Distribution for Python with Numba

The rapid growth in popularity of Python as a programming language for mathematics, science, and engineering applications has been amazing. Not only is it easy to learn, but there is a vast treasure of packaged open source libraries out there targeted at just about every computational domain imaginable. This sponsored post from Intel highlights how today’s enterprises can achieve high levels of parallelism in large scale Python applications using the Intel Distribution for Python with Numba. 

The Challenges of Updating Scientific Codes for New HPC Architectures

In this video from PASC19 in Zurich, Benedikt Riedel from the University of Wisconsin describes the challenges researchers face when it comes to updating their scientific codes for new HPC architectures. After that he describes his work on the IceCube Neutrino Observatory.

Video: Data-Centric Parallel Programming

In this slidecast, Torsten Hoefler from ETH Zurich presents: Data-Centric Parallel Programming. “To maintain performance portability in the future, it is imperative to decouple architecture-specific programming paradigms from the underlying scientific computations. We present the Stateful DataFlow multiGraph (SDFG), a data-centric intermediate representation that enables separating code definition from its optimization.”

Intel Optimized Libraries Accelerate Deep Learning Applications on Intel Platforms

Whatever the platform, getting the best possible performance out of an application always presents big challenges. This is especially true when developing AI and machine learning applications on CPUs. This sponsored post from Intel explores how to effectively train and execute machine learning and deep learning projects on CPUs.

Video: Portable Programming Models Highlighted at PASC19

In this video from PASC19 in Zurich, Technical Papers co-chair Sunita Chandrasekaran provides some highlights from the conference. After that, Sunita previews the upcoming Workshop on Performance Portable Programming Models for Accelerators (P3MA) at ISC 2019. “This workshop will provide a forum to bring together researchers and developers to discuss community’s proposals and solutions to performance.”

Appentra Releases Parallelware Trainer 1.2

Appentra is pleased to announce the release of Parallelware Trainer 1.2, further improving the provision of accessible HPC and parallel programming training using OpenMP and OpenACC. “Appentra has a clear goal: to make parallel programming easier, enabling everyone to make the best use of parallel computing hardware from the multi-cores in a laptop to the fastest supercomputers. Parallelware Trainer 1.2 provides an enhanced interactive learning environment, including provision for a knowledge base designed around the code being developed and several parallelization paradigms, including multithreading, tasking and offloading to GPUs.”

Are Memory Bottlenecks Limiting Your Application’s Performance?

Often, it’s not enough to parallelize and vectorize an application to get the best performance. You also need to take a deep dive into how the application is accessing memory to find and eliminate bottlenecks in the code that could ultimately be limiting performance. Intel Advisor, a component of both Intel Parallel Studio XE and Intel System Studio, can help you identify and diagnose memory performance issues, and suggest strategies to improve the efficiency of your code.

Software-Defined Visualization with Intel Rendering Framework – No Special Hardware Needed

This sponsored post from Intel explores how the Intel Rendering Framework, which brings together a number of optimized, open source rendering libraries, can deliver better performance at a higher degree of fidelity — without having to invest in extra hardware. By letting the CPU do the work, visualization applications can run anywhere without specialized hardware, and users are seeing better performance than they could get from dedicated graphics hardware and limited memory.