Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Lenovo to Build 26.7 Petaflop SuperMUC-NG Cluster for LRZ in Germany

“Upon its completion in late 2018, the new Lenovo supercomputer (called SuperMUC-NG) will support LRZ in its groundbreaking research across a variety of complex scientific disciplines, such as astrophysics, fluid dynamics and life sciences, by offering highly available, secure and energy-efficient high-performance computing services that leverage industry-leading technology optimized to address the a broad range of scientific computing applications. The LRZ installation will also feature the 20-millionth server shipped by Lenovo, a significant milestone in the company’s data center history.”

Benchmarking Optimized 3D Electromagnetic Simulation Tools

New benchmarks from Computer Simulation Technology on their recently optimized 3D electromagnetic field simulation tools compare the performance of the new Intel Xeon Scalable processors with previous generation Intel Xeon processors. “Our team works with the customers in terms of testing of models and configuration settings to make good recommendations for customers so they get a well performing system and the best performance when running the models.”

PRACE Offers Supercomputing 101 Course

PRACE is offering an online Supercomputing 101 course through the Future Learn program. This free online course will introduce you to what supercomputers are, how they are used and how we can exploit their full computational potential to make scientific breakthroughs. “Using supercomputers, we can now conduct virtual experiments that are impossible in the real world.”

Intel’s Xeon Scalable Processors Provide Cooling Challenges for HPC

Unless you reduce node and rack density, the wattages of today’s high-poweredCPUs and GPUs are simply no longer addressable with air cooling alone. Asetek explores how new processors, such as Intel’s Xeon Scalable processors, often call for more than just air cooling. “The largest Xeon Phi direct-to-chip cooled system today is Oakforest-PACS system in Japan. The system is made up of 8,208 computational nodes using Asetek Direct-to-Chip liquid cooled Intel Xeon Phi high performance processors with Knights Landing architecture. It is the highest performing system in Japan and #7 on the Top500.”

Jülich to Build 5 Petaflop Supercomputing Booster with Dell

Today Intel and the Jülich Supercomputing Centre together with ParTec and Dell today announced plans to develop and deploy a next-generation modular supercomputing system. Leveraging the experience and results gained in the EU-funded DEEP and DEEP-ER projects, in which three of the partners have been strongly engaged, the group will develop the necessary mechanisms required to augment JSC’s JURECA cluster with a highly-scalable component named “Booster” and being based on Intel’s Scalable Systems Framework (Intel SSF).

Creating Applications with the Intel Computer Vision SDK

“In order for developers to be able to focus on their application, a Vision Algorithm Designer application is included in the Intel Computer Vision SDK. This gives users a drag and drop interface that allows them to create new applications on the fly. Large and complex workflows can be modelled visually which takes the guesswork out of bringing together many different functions. In addition, customized code can be added to the workflows.”

Fathom Neural Compute Stick Enables Mobile Devices to React Cognitively

Intel-owned Movidius has introduced a fascinating new device called the Fathom Neural Compute Stick, a modular deep learning accelerator in the form of a standard USB stick. “The Fathom Neural Compute Stick is the first of its kind: A powerful, yet surprisingly efficient Deep Learning processor embedded into a standard USB stick. The Fathom Neural Compute Stick acts as a discrete neural compute accelerator, allowing devices with a USB port run neural networks at high speed, while sipping under a single Watt of power.”

Speed Your Application with Threading Building Blocks

With modern processors that contain a large number of cores, to get maximum performance it is necessary to structure an application to use as many cores as possible. Explicitly developing a program to do this can take a significant amount of effort. It is important to understand the science and algorithms behind the application, and then use whatever programming techniques that are available. “Intel Threaded Building Blocks (TBB) can help tremendously in the effort to achieve very high performance for the application.”

Best Threads Per Core with Intel Xeon Phi

“When designing an application that contains many threads and less cores than threads, it is important to understand what is the optimal number of threads that should be assigned to a core. This value should be parameterized, in order to easily run tests to determine which is the optimum value for a given machine. One thread per core on the Intel Xeon Phi processor will give the highest performance per thread. When the number of threads per core is set at two or four, the individual thread performance may be lower, but the aggregate performance will be greater.”

NVIDIA Tesla P100 GPU Review

Accelerated computing continues to gain momentum as the HPC community moves towards Exascale. Our recent Tesla P100 GPU review shows how these accelerators are opening up new worlds of performance vs. traditional CPU-based systems and even vs. NVIDIA’s previous K80 GPU product. We’ve got benchmarks, case studies, and more in the insideHPC Research Report on GPU Accelerators.