Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Six Steps Towards Better Performance on Intel Xeon Phi

“As with all new technology, developers will have to create processes in order to modernize applications to take advantage of any new feature. Rather than randomly trying to improve the performance of an application, it is wise to be very familiar with the application and use available tools to understand bottlenecks and look for areas of improvement.”

Vectorization Leads to Performance Gains

Applications that can take advantage of the new vectorization capabilities of the Intel Xeon Phi processor will show tremendous performance gains. “When considering vectorization, there are different tools that can assist the developer in determining where to look further. The first is to look at the optimization reports that are generated by the Intel compiler and then to also use the Vector Analyzer that can give specific advice on what to do to get more vectorization from the code.”

In Memory Computing Speeds Results

In-Memory Computing can accelerate traditional applications by using a memory first design. Applicable to a wide range of domains, In-Memory Computing and In-Memory Data Grids take advantage of the latest trends in computer systems technology. “In-memory computing is designed to address some of the most critical and real-time task requirements today. This include real-time fraud detection, biometrics and border security and financial risk analytics. All of these use cases require very low latency access to data from very large amounts of data, which results in faster and more accurate decisions.”

Managing Lots of Tasks for Intel Xeon Phi

“OpenMP, Fortran 2008 and TBB are standards that can help to create parallel areas of an application. MKL could also be considered to be part of this family, because it uses OpenMP within the library. OpenMP is well known and has been used for quite some time and is continues to be enhanced. Some estimates are as high as 75 % of cycles used are for Fortran applications. Thus, in order to modernize some of the most significant number crunchers today, Fortran 2008 should be investigated. TBB is for C++ applications only, and does not require compiler modifications. An additional benefit to using OpenMP and Fortran 2008 is that these are standards, which allows code to be more portable.”

Programming for High Performance Processors

“Managing the work on each node can be referred to as Domain parallelism. During the run of the application, the work assigned to each node can be generally isolated from other nodes. The node can work on its own and needs little communication with other nodes to perform the work. The tools that are needed for this are MPI for the developer, but can take advantage of frameworks such as Hadoop and Spark (for big data analytics). Managing the work for each core or thread will need one level down of control. This type of work will typically invoke a large number of independent tasks that must then share data between the tasks.”

Building for the Future Aurora Supercomputer at Argonne

“Argonne National Labs has created a process to assist in moving large applications to a new system. Their current HPC system, Mira will give way to the next generation system, Aurora, which is part of the collaboration of Oak Ridge, Argonne, and Livermore (CORAL) joint procurement. Since Aurora contains technology that was not available in Mira, the challenge is to give scientists and developers access to some of the new technology, well before the new system goes online. This allows for a more productive environment once the full scale new system is up.”

Artificial Intelligence Becomes More Accessible

With the advent of heterogeneous computing systems that combine both main CPUs and connected processors that can ingest and process tremendous amounts of data and run complex algorithms, artificial intelligence (AI) technologies are beginning to take hold in a variety of industries. Massive datasets can now be used to drive innovation in industries such as autonomous driving systems, controlling power grids and combining data to arrive at a profitable decision, for example. Read how AI can now be used in various industries using the latest hardware and software.

Speed Your Application with Threading Building Blocks

With modern processors that contain a large number of cores, to get maximum performance it is necessary to structure an application to use as many cores as possible. Explicitly developing a program to do this can take a significant amount of effort. It is important to understand the science and algorithms behind the application, and then use whatever programming techniques that are available. “Intel Threaded Building Blocks (TBB) can help tremendously in the effort to achieve very high performance for the application.”

How Engility Delivers HPC

In this special feature, our own MichaelS reports on his SC16 meeting with Engility, a premier provider of integrated services for the U.S. government. “Complex High Performance Computing environments require careful planning, deep investigation into the technologies available and the ability to bring on-line a large system. Engility is uniquely positioned to work with demanding customers that require close collaboration in order to bring on-line state-of-the-art systems.”

Fast Networking for Next Generation Systems

“The Intel Omni-Path Architecture is an example of a networking system that has been designed for the Exascale era. There are many features that will enable this massive scaling of compute resources. Features and functionality are designed in at both the host and the fabric levels. This enables very large scaling when all of the components are designed together. Increased reliability is a result of integrating the CPU and fabric, which will be critical as the number of nodes expands well beyond any system in operation today. In addition, tools and software that have been designed to be installed and managed at the very large number of compute nodes that will be necessary to achieve this next level of performance.”