Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:

Introduction to Parallel Programming with OpenACC

“This is the first in a series of short videos to introduce you to parallel programming with OpenACC and the PGI compilers, using C++ or Fortran. You will learn by example how to build a simple example program, how to add OpenACC directives, and to rebuild the program for parallel execution on a multicore system. To get the most out of this video, you should download the example programs and follow along on your workstation.”

PGI and NNSA to Open Source Fortran Compiler

Today the NNSA and its three national labs announced they have reached an agreement with Nvidia’s PGI software to create an open-source Fortran compiler designed for integration with the widely used LLVM compiler infrastructure.

PGI Accelerator Compilers Add OpenACC Support for x86

“Our goal is to enable HPC developers to easily port applications across all major CPU and accelerator platforms with uniformly high performance using a common source code base,” said Douglas Miles, director of PGI Compilers & Tools at NVIDIA. “This capability will be particularly important in the race towards exascale computing in which there will be a variety of system architectures requiring a more flexible application programming approach.”

Video: OpenACC for Fortran Programmers

“Learn how to program NVIDIA GPUs using Fortran with OpenACC directives. The first half of this presentation will introduce OpenACC to new GPU and OpenACC programmers, providing the basic material necessary to start successfully using GPUs for your Fortran programs. The second half will be intermediate material, with more advanced hints and tips for Fortran programmers with larger applications that they want to accelerate with a GPU. Among the topics to be covered will be dynamic device data lifetimes, global data, procedure calls, derived type support, and much more.”

PGI Steps up with Support for Jetson TK1 and Power8

At SC14, Nvidia announced that it is developing an enhanced version of the widely used PGI optimizing compilers which will allow developers to quickly develop new applications or run Linux x86-based GPU-accelerated applications on IBM POWER CPU systems with minimal effort.

PGI 14.4 Compiler Suite Adds OpenACC Features and Multicore Performance Gains

Today The Portland Group released Version 14.4 of the PGI 2014 Compilers and Tools suite with expanded OpenACC features and Multi-core x64 Performance Gains.

Free PGI Comes to OS X

This week PGI announced that Free PGI is now available for Macs running OS X. The package includes the PGI high-performance parallel C99 and Fortran 2003 compilers and parallel debugger for 64-bit and 32-bit Intel processor-based Macs.

Slidecast: New PGI 2014 Release Adds OpenACC 2.0 Features and x64 Performance Gains

In this slidecast, Doug Miles from Nvidia describes the new features and performance gains in the PGI 2014 release. “The use of accelerators in high performance computing is now mainstream,” said Douglas Miles, director of PGI Software at Nvidia. “With PGI 2014, we are taking another big step toward our goal of providing platform-independent, multi-core and accelerator programming tools that deliver outstanding performance on multiple platforms without the need for extensive, device-specific tuning.”

On the Road to Exascale: The Challenges of Portable Heterogeneous Programming

We heard some very good reviews of a talk given by Doug Miles of The Portland Group at the bi-annual Clouds, Clusters and Data for Scientific Computing technical meeting outside of Lyon, France in mid- September. Most of the talks from that meeting are available online at the CCDSC 2012 website, but the PGI talk did not include any slides. PGI has provided the Exascale Report with a copy of the transcript from the talk, which we have reproduced here with a few minor edits.

Most of today’s CPU-only large-scale systems have a similar look-and-feel: many homogeneous nodes communicating via MPI, each node has a few identical processor chips, each chip has multiple identical cores, each core has some SIMD processing capability. Programming one such system is very like programming any other, regardless of chip vendor, number of total cores, number of cores per node, SIMD width or interconnect fabric.

Setting aside Accelerator-enabled systems for a minute, how did we get here? How did we reach this level of homogeneity from such heterogeneous HPC system roots? 25 or 30 years ago we had vector machines, VLIW machines, SMP machines, massively parallel SIMD machines, and literally scores of different instruction set architectures. How did systems become so homogeneous?