Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Achieving High-Performance Math Processing with Intel MKL 2017

“Many of the libraries developed in the 70s and 80s for core linear algebra and scientific math computation, such as BLAS, LAPACK, FFT, are still in use today with C, C++, Fortran, and even Python programs. With MKL, Intel has engineered a ready-to-use, royalty-free library that implements these numerical algorithms optimized specifically to take advantage of the latest features of Intel chip architectures. Even the best compiler can’t compete with the level of performance possible from a hand-optimized library. Any application that already relies on the BLAS or LAPACK functionality will achieve better performance on Intel and compatible architectures just by downloading and re-linking with Intel MKL.”

Managing Lots of Tasks for Intel Xeon Phi

“OpenMP, Fortran 2008 and TBB are standards that can help to create parallel areas of an application. MKL could also be considered to be part of this family, because it uses OpenMP within the library. OpenMP is well known and has been used for quite some time and is continues to be enhanced. Some estimates are as high as 75 % of cycles used are for Fortran applications. Thus, in order to modernize some of the most significant number crunchers today, Fortran 2008 should be investigated. TBB is for C++ applications only, and does not require compiler modifications. An additional benefit to using OpenMP and Fortran 2008 is that these are standards, which allows code to be more portable.”

A Decade of Multicore Parallelism with Intel TBB

While HPC developers worry about squeezing out the ultimate performance while running an application on dedicated cores, Intel TBB tackles a problem that HPC users never worry about: How can you make parallelism work well when you share the cores that you run upon?” This is more of a concern if you’re running that application on a many-core laptop or workstation than a dedicated supercomputer because who knows what will also be running on those shared cores. Intel Threaded Building Blocks reduce the delays from other applications by utilizing a revolutionary task-stealing scheduler. This is the real magic of TBB.

Exascale Computing: A Race to the Future of HPC

In this week’s Sponsored Post, Nicolas Dube of Hewlett Packard Enterprise outlines the future of HPC and the role and challenges of exascale computing in this evolution. The HPE approach to exascale is geared to breaking the dependencies that come with outdated protocols. Exascale computing will allow users to process data, run systems, and solve problems at a totally new scale, which will become increasingly important as the world’s problems grow ever larger and more complex.

OpenFabrics Alliance Workshop 2017 – Call for Sessions Open

Each year the OpenFabrics Alliance (OFA) hosts an annual workshop devoted to advancing the state of the art in networking. “One secret to the enduring success of the workshop is the OFA’s emphasis on hosting an interactive, community-driven event. To continue that trend, we are once again reaching out to the community to create a rich program that addresses topics important to the networking industry. We’re looking for proposals for workshop sessions.”

Programming for High Performance Processors

“Managing the work on each node can be referred to as Domain parallelism. During the run of the application, the work assigned to each node can be generally isolated from other nodes. The node can work on its own and needs little communication with other nodes to perform the work. The tools that are needed for this are MPI for the developer, but can take advantage of frameworks such as Hadoop and Spark (for big data analytics). Managing the work for each core or thread will need one level down of control. This type of work will typically invoke a large number of independent tasks that must then share data between the tasks.”

Remote Visualization Accelerating Innovation Across Multiple Industries

Remote visualization tools allow employees to dramatically improve productivity by accessing business-critical data and programs regardless of their location. Remote visualization technologies allow users to launch software applications on the server side and display the results locally, letting them leverage the bandwidth and compute power of the cluster while circumventing the latency and security risks of downloading large amounts of data onto their local client.

Speed Your Application with Threading Building Blocks

With modern processors that contain a large number of cores, to get maximum performance it is necessary to structure an application to use as many cores as possible. Explicitly developing a program to do this can take a significant amount of effort. It is important to understand the science and algorithms behind the application, and then use whatever programming techniques that are available. “Intel Threaded Building Blocks (TBB) can help tremendously in the effort to achieve very high performance for the application.”

GPU Accelerated Servers for Deep Learning Applications

Applications such as machine learning and deep learning require incredible compute power, and these are becoming more crucial to daily life every day. These applications help provide artificial intelligence for self-driving cars, climate prediction, drugs that treat today’s worst diseases, plus other solutions to more of our world’s most important challenges. There is a multitude of ways to increase compute power but one of the easiest is to use the most powerful GPUs.

Fast Networking for Next Generation Systems

“The Intel Omni-Path Architecture is an example of a networking system that has been designed for the Exascale era. There are many features that will enable this massive scaling of compute resources. Features and functionality are designed in at both the host and the fabric levels. This enables very large scaling when all of the components are designed together. Increased reliability is a result of integrating the CPU and fabric, which will be critical as the number of nodes expands well beyond any system in operation today. In addition, tools and software that have been designed to be installed and managed at the very large number of compute nodes that will be necessary to achieve this next level of performance.”