PGI Steps up with Support for Jetson TK1 and Power8

At SC14, Nvidia announced that it is developing an enhanced version of the widely used PGI optimizing compilers which will allow developers to quickly develop new applications or run Linux x86-based GPU-accelerated applications on IBM POWER CPU systems with minimal effort.

PGI 14.4 Compiler Suite Adds OpenACC Features and Multicore Performance Gains

Today The Portland Group released Version 14.4 of the PGI 2014 Compilers and Tools suite with expanded OpenACC features and Multi-core x64 Performance Gains.

Free PGI Comes to OS X

This week PGI announced that Free PGI is now available for Macs running OS X. The package includes the PGI high-performance parallel C99 and Fortran 2003 compilers and parallel debugger for 64-bit and 32-bit Intel processor-based Macs.

Slidecast: New PGI 2014 Release Adds OpenACC 2.0 Features and x64 Performance Gains

In this slidecast, Doug Miles from Nvidia describes the new features and performance gains in the PGI 2014 release. “The use of accelerators in high performance computing is now mainstream,” said Douglas Miles, director of PGI Software at Nvidia. “With PGI 2014, we are taking another big step toward our goal of providing platform-independent, multi-core and accelerator programming tools that deliver outstanding performance on multiple platforms without the need for extensive, device-specific tuning.”

On the Road to Exascale: The Challenges of Portable Heterogeneous Programming

We heard some very good reviews of a talk given by Doug Miles of The Portland Group at the bi-annual Clouds, Clusters and Data for Scientific Computing technical meeting outside of Lyon, France in mid- September. Most of the talks from that meeting are available online at the CCDSC 2012 website, but the PGI talk did not include any slides. PGI has provided the Exascale Report with a copy of the transcript from the talk, which we have reproduced here with a few minor edits.

Most of today’s CPU-only large-scale systems have a similar look-and-feel: many homogeneous nodes communicating via MPI, each node has a few identical processor chips, each chip has multiple identical cores, each core has some SIMD processing capability. Programming one such system is very like programming any other, regardless of chip vendor, number of total cores, number of cores per node, SIMD width or interconnect fabric.

Setting aside Accelerator-enabled systems for a minute, how did we get here? How did we reach this level of homogeneity from such heterogeneous HPC system roots? 25 or 30 years ago we had vector machines, VLIW machines, SMP machines, massively parallel SIMD machines, and literally scores of different instruction set architectures. How did systems become so homogeneous?