Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:

PRACE Offers Supercomputing 101 Course

PRACE is offering an online Supercomputing 101 course through the Future Learn program. This free online course will introduce you to what supercomputers are, how they are used and how we can exploit their full computational potential to make scientific breakthroughs. “Using supercomputers, we can now conduct virtual experiments that are impossible in the real world.”

Intel’s Xeon Scalable Processors Provide Cooling Challenges for HPC

Unless you reduce node and rack density, the wattages of today’s high-poweredCPUs and GPUs are simply no longer addressable with air cooling alone. Asetek explores how new processors, such as Intel’s Xeon Scalable processors, often call for more than just air cooling. “The largest Xeon Phi direct-to-chip cooled system today is Oakforest-PACS system in Japan. The system is made up of 8,208 computational nodes using Asetek Direct-to-Chip liquid cooled Intel Xeon Phi high performance processors with Knights Landing architecture. It is the highest performing system in Japan and #7 on the Top500.”

Jülich to Build 5 Petaflop Supercomputing Booster with Dell

Today Intel and the Jülich Supercomputing Centre together with ParTec and Dell today announced plans to develop and deploy a next-generation modular supercomputing system. Leveraging the experience and results gained in the EU-funded DEEP and DEEP-ER projects, in which three of the partners have been strongly engaged, the group will develop the necessary mechanisms required to augment JSC’s JURECA cluster with a highly-scalable component named “Booster” and being based on Intel’s Scalable Systems Framework (Intel SSF).

Creating Applications with the Intel Computer Vision SDK

“In order for developers to be able to focus on their application, a Vision Algorithm Designer application is included in the Intel Computer Vision SDK. This gives users a drag and drop interface that allows them to create new applications on the fly. Large and complex workflows can be modelled visually which takes the guesswork out of bringing together many different functions. In addition, customized code can be added to the workflows.”

Fathom Neural Compute Stick Enables Mobile Devices to React Cognitively

Intel-owned Movidius has introduced a fascinating new device called the Fathom Neural Compute Stick, a modular deep learning accelerator in the form of a standard USB stick. “The Fathom Neural Compute Stick is the first of its kind: A powerful, yet surprisingly efficient Deep Learning processor embedded into a standard USB stick. The Fathom Neural Compute Stick acts as a discrete neural compute accelerator, allowing devices with a USB port run neural networks at high speed, while sipping under a single Watt of power.”

Speed Your Application with Threading Building Blocks

With modern processors that contain a large number of cores, to get maximum performance it is necessary to structure an application to use as many cores as possible. Explicitly developing a program to do this can take a significant amount of effort. It is important to understand the science and algorithms behind the application, and then use whatever programming techniques that are available. “Intel Threaded Building Blocks (TBB) can help tremendously in the effort to achieve very high performance for the application.”

Best Threads Per Core with Intel Xeon Phi

“When designing an application that contains many threads and less cores than threads, it is important to understand what is the optimal number of threads that should be assigned to a core. This value should be parameterized, in order to easily run tests to determine which is the optimum value for a given machine. One thread per core on the Intel Xeon Phi processor will give the highest performance per thread. When the number of threads per core is set at two or four, the individual thread performance may be lower, but the aggregate performance will be greater.”

NVIDIA Tesla P100 GPU Review

Accelerated computing continues to gain momentum as the HPC community moves towards Exascale. Our recent Tesla P100 GPU review shows how these accelerators are opening up new worlds of performance vs. traditional CPU-based systems and even vs. NVIDIA’s previous K80 GPU product. We’ve got benchmarks, case studies, and more in the insideHPC Research Report on GPU Accelerators.

FPGA Myths

As data center sprawl is now understood to be expensive and may not deliver performance increases for all types of applications, new technologies are coming to the rescue. A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturing – hence “field-programmable”. While the use of GPUs and HPC accelerators are generally understood today, there are a number of misconceptions about FPGAs that need to be understood.

Memory Modes For Increased Performance on Intel Xeon Phi

The Intel Xeon Phi processor supports different types of memory, and can organize this into three types of memory mode. The new processor from Intel contains two type of memory, MCDRAM and DDR memory. These different memory subsystems are complimentary but can be used in different ways, depending on the application that is being executed. “By using these two types of memory in the same system gives flexibility to the overall system and will show an increase in performance for almost any application.”