Facilitate HPC Deployments with Reference Designs for Intel Scalable System Framework

With Intel Scalable System Framework Architecture Specification and Reference Designs, the company is making it easier to accelerate the time to discovery through high-performance computing. The Reference Architectures (RAs) and Reference Designs take Intel Scalable System Framework to the next step—deploying it in ways that will allow users to confidently run their workloads and allow system builders to innovate and differentiate designs

Shared Memory and MPI 3.0

As multi-socket, then multi-core systems have become the standard, the Message Passing Interface (MPI) has become one of the most popular programming models for applications that can run in parallel using many sockets and cores. Shared memory programming interfaces, such as OpenMP, have allowed developers to take advantage of systems that combine many individual servers and shared memory within the server itself. However, two different programming models have been used at the same time. The MPI 3.0 standard allows for a new MPI interprocess shared memory extension (MPI SHM).

Arithmetic Intensity of Stencil Operations

Applications that use 3D Finite Difference (3DFD) calculations are numerically intensive and can be optimized quite heavily to take advantage of accelerators that are available in today’s systems. The performance of an implementation can and should be optimized using numerical stencils. Choices made when designing and implementing algorithms can affect the Arithmetic Intensity (AI), which is a measure of how efficient an implementation, by comparing the flops and memory access.

Intel Updates Developer Toolkit with Data Analytics Acceleration Library

Today Intel released Intel Parallel Studio XE 2016, the next iteration of its developer toolkit for HPC and technical computing applications. This release introduces the Intel Data Analytics Acceleration Library, a library for big data developers that turns large data clusters into meaningful information with advanced analytics algorithms.

Video: Intel Vector Advisor Unlocks Code Performance

In this video, Rick Leinecker from Slashdot Media describes the Vectorization Advisor, one of the new additions to Intel Parallel Studio XE suite. “Vectorization Advisor is an analysis tool that lets you identify if loops utilize modern SIMD instructions or not, what prevents vectorization, what is performance efficiency and how to increase it. Vectorization Advisor shows compiler optimization reports in user-friendly way, and extends them with multiple other metrics, like loop trip counts, CPU time, memory access patterns and recommendations for optimization.”

Video: Beta Review of Intel Parallel Studio XE 2016

In this video, Rick Leinecker from Slashdot Media reviews the beta version of Intel Parallel Studio XE 2016. Leinecker describes several of the notable features and updates, including OpenMP enhancements, vastly improved computer vision and image processing, and the Data Analytics Acceleration Library.

Gather Scatter Operations

Gather and scatter operations are used in many domains. However, to use these types of functions on an SIMD architecture creates some programming challenges.

Fortran Still Going Strong

Fortran still going strong. NERSC estimates that over half the hours on their systems are used by Fortran codes. This is quite amazing, given that Fortran first appeared about 60 years ago.

Numerical Optimization for Deep Learning

“With the advent of massively parallel computing coprocessors, numerical optimization for deep-learning disciplines is now possible. Complex real-time pattern recognition, for example, that can be used for self driving cars and augmented reality can be developed and high performance achieved with the use of specialized, highly tuned libraries. By just using the Message Passing Interface (MPI) API, very high performance can be attained on hundreds to thousands of Intel Xeon Phi processors.”

Interview: Powering up Vectorization with Intel Parallel Studio XE 2015

“The thing that really excites me is looking at OpenMP 4.0. We’ve got virtually a complete set of 4.0 features. OpenMP 4.0 brings together tasking, which it’s had since its start in ’97, with new capabilities for vectorization and for offload. Bringing those together, and being able to do them at the same time, is extraordinarily powerful. I love teaching classes about it and seeing what people can do with it. And now it’s fully supported in our products.”