More Than Ever, Vectorization and Multithreading are Essential for Performance

Employing a hybrid of MPI across nodes in a cluster, multithreading with OpenMP* on each node, and vectorization of loops within each thread results in multiple performance gains. In fact, most application codes will run slower on the latest supercomputers if they run purely sequentially. This means that adding multithreading and vectorization to applications is now essential for running efficiently on the latest architectures.

3X Performance Boost Using Intel Advisor and Intel Trace Analyzer in Astrophysics Simulations

On today’s processors, it is crucial to both vectorize (using AVX* or SIMD* instructions) and parallelize software to realize the full performance potential of the processor. By optimizing their MHD astrophysics applications with tools from Intel Parallel Studio XE, and running on the latest Intel hardware, the NSU team achieved a performance speed-up of 3X, cutting the standard time for calculating one problem from one week to just two days.

Introduction to Parallel Programming with OpenACC

“This is the first in a series of short videos to introduce you to parallel programming with OpenACC and the PGI compilers, using C++ or Fortran. You will learn by example how to build a simple example program, how to add OpenACC directives, and to rebuild the program for parallel execution on a multicore system. To get the most out of this video, you should download the example programs and follow along on your workstation.”

Minimal Metrics Releases PerfMiner Parallel Optimization Tool

This week Minimal Metrics announced an early-adopter program for PerfMiner, which uses lightweight, and pervasive performance data collection technology, automates its collection, and mines the data for key performance indicators. These indicators were developed through Minimal Metrics’ extensive experience tuning HPC and enterprise application performance, presented in an audience-specific, drill-down hierarchy that provides accountability for site productivity down to the performance of individual application threads.

PRIMEHPC FX10 Fujitsu Supercomputer

Fujitsu developed the first Japanese supercomputer in 1977. In the thirty-plus years since then, we have been leading the development of supercomputers with the application of advanced technologies. We now introduce the PRIMEHPC FX10, a state-of-the-art supercomputer that makes the petascale computing achieved by the “K computer”(*1) more accessible.

SAS Analytics Using Direct Memory Access

Using Remote Direct Memory Access based analytics and fast, scalable,external disk systems with massively parallel access to data, SAS analytics driven organizations can deliver timely and accurate execution for data intensive workflows such as risk management, while incorporating larger datasets than using traditional NAS.

Parallel Storage Solutions for Better Performance

Using high performance parallel storage solutions, geologists and researchers can now incorporate larger data sets and execute more seismic and reservoir simulations faster than ever before, enabling higher fidelity geological analysis and significantly reduced exploration risk. With high costs of exploration, oil and gas companies are increasingly turning to high performance DDN storage solutions to eliminate I/O bottlenecks, minimize risk and costs, while delivering a larger number of higher fidelity simulations in same time as traditional storage architectures.

Cilk Plus from Intel Offers Easy Access to Performance

Intel® Cilk™ Plus is an extension to C and C++ that offers a quick and easy way to harness the power of both multicore and vector processing. The three Intel Cilk Plus keywords provide a simple yet surprisingly powerful model for parallel programming, while runtime and template libraries offer a well-tuned environment for building parallel applications.

Basics For Coprocessors

“The Intel Xeon Phi coprocessor is an example of a many core system that can greatly increase the performance of an application when used correctly. Simply taking a serial application and expecting tremendous performance gains will not happen. Rewriting parts of the application will be necessary to take advantage of the architecture of the Intel Xeon Phi coprocessor.”

Code Modernization for High Performance Hardware

“Parallel software and parallel hardware, used together will give the best results for an application. If the application is serial in nature, and the processor is serial, then there will obviously not be a great gain in performance. When the application is parallelized, but the processor is serial, again, no great gain. A third combination is when the application is serial and the processing is parallel. Since the application cannot take advantage of the increased power of the hardware, there will not be a great performance boost. The best and really only solution is to modify the application to run in parallel, using high performing parallel hardware.”