Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Intel® Graphics Performance Analyzer for Faster Graphics Performance

“Just as developers need tools to understand the performance of a CPU intensive application in order to modify the code for higher performance, so do those that develop interactive 3D computer graphics applications. An excellent tool for t this purpose is the Intel Graphics Performance Analyzer set. This tool, which is free to download, can help the developer understand at a very low level how the application is performing, from a number of aspects.”

Creating Applications with the Intel Computer Vision SDK

“In order for developers to be able to focus on their application, a Vision Algorithm Designer application is included in the Intel Computer Vision SDK. This gives users a drag and drop interface that allows them to create new applications on the fly. Large and complex workflows can be modelled visually which takes the guesswork out of bringing together many different functions. In addition, customized code can be added to the workflows.”

Let The Compiler Do Its Thing

“In the past, developers would get best results if a loop was unrolled, that is, duplicating the body as many times as needed to that the operations could be operated on using full vectors. The number of iterations would reflect the hardware that the code was targeted towards. Since the application may have to run on different hardware in the future, results for todays generation of hardware may be compromised in the future. In fact, it is better to let modern compilers to the unrolling.”

Six Steps Towards Better Performance on Intel Xeon Phi

“As with all new technology, developers will have to create processes in order to modernize applications to take advantage of any new feature. Rather than randomly trying to improve the performance of an application, it is wise to be very familiar with the application and use available tools to understand bottlenecks and look for areas of improvement.”

Vectorization Leads to Performance Gains

Applications that can take advantage of the new vectorization capabilities of the Intel Xeon Phi processor will show tremendous performance gains. “When considering vectorization, there are different tools that can assist the developer in determining where to look further. The first is to look at the optimization reports that are generated by the Intel compiler and then to also use the Vector Analyzer that can give specific advice on what to do to get more vectorization from the code.”

In Memory Computing Speeds Results

In-Memory Computing can accelerate traditional applications by using a memory first design. Applicable to a wide range of domains, In-Memory Computing and In-Memory Data Grids take advantage of the latest trends in computer systems technology. “In-memory computing is designed to address some of the most critical and real-time task requirements today. This include real-time fraud detection, biometrics and border security and financial risk analytics. All of these use cases require very low latency access to data from very large amounts of data, which results in faster and more accurate decisions.”

Managing Lots of Tasks for Intel Xeon Phi

“OpenMP, Fortran 2008 and TBB are standards that can help to create parallel areas of an application. MKL could also be considered to be part of this family, because it uses OpenMP within the library. OpenMP is well known and has been used for quite some time and is continues to be enhanced. Some estimates are as high as 75 % of cycles used are for Fortran applications. Thus, in order to modernize some of the most significant number crunchers today, Fortran 2008 should be investigated. TBB is for C++ applications only, and does not require compiler modifications. An additional benefit to using OpenMP and Fortran 2008 is that these are standards, which allows code to be more portable.”

Programming for High Performance Processors

“Managing the work on each node can be referred to as Domain parallelism. During the run of the application, the work assigned to each node can be generally isolated from other nodes. The node can work on its own and needs little communication with other nodes to perform the work. The tools that are needed for this are MPI for the developer, but can take advantage of frameworks such as Hadoop and Spark (for big data analytics). Managing the work for each core or thread will need one level down of control. This type of work will typically invoke a large number of independent tasks that must then share data between the tasks.”

Building for the Future Aurora Supercomputer at Argonne

“Argonne National Labs has created a process to assist in moving large applications to a new system. Their current HPC system, Mira will give way to the next generation system, Aurora, which is part of the collaboration of Oak Ridge, Argonne, and Livermore (CORAL) joint procurement. Since Aurora contains technology that was not available in Mira, the challenge is to give scientists and developers access to some of the new technology, well before the new system goes online. This allows for a more productive environment once the full scale new system is up.”

Artificial Intelligence Becomes More Accessible

With the advent of heterogeneous computing systems that combine both main CPUs and connected processors that can ingest and process tremendous amounts of data and run complex algorithms, artificial intelligence (AI) technologies are beginning to take hold in a variety of industries. Massive datasets can now be used to drive innovation in industries such as autonomous driving systems, controlling power grids and combining data to arrive at a profitable decision, for example. Read how AI can now be used in various industries using the latest hardware and software.