“As with all new technology, developers will have to create processes in order to modernize applications to take advantage of any new feature. Rather than randomly trying to improve the performance of an application, it is wise to be very familiar with the application and use available tools to understand bottlenecks and look for areas of improvement.”
DK Panda from Ohio State University presented this deck at the 2017 HPC Advisory Council Stanford Conference. “This talk will focus on challenges in designing runtime environments for exascale systems with millions of processors and accelerators to support various programming models. We will focus on MPI, PGAS (OpenSHMEM, CAF, UPC and UPC++) and Hybrid MPI+PGAS programming models by taking into account support for multi-core, high-performance networks, accelerators (GPGPUs and Intel MIC), virtualization technologies (KVM, Docker, and Singularity), and energy-awareness. Features and sample performance numbers from the MVAPICH2 libraries will be presented.”
The European PRACE initiative has published a new Best Practice Guide for Intel Xeon Phi, Knights Landing Edition. “This best practice guide provides information about Intel’s MIC architecture and programming models for the Intel Xeon Phi co-processor in order to enable programmers to achieve good performance of their applications. The guide covers a wide range of topics from the description of the hardware of the Intel Xeon Phi co-processor through information about the basic programming models as well as information about porting programs up to tools and strategies how to analyze and improve the performance of applications.”
Argonne has selected 10 computational science and engineering research projects for its Aurora Early Science Program starting this month. Aurora, a massively parallel, manycore Intel-Cray supercomputer, will be ALCF’s next leadership-class computing resource and is expected to arrive in 2018. The Early Science Program helps lay the path for hundreds of other users by doing actual science, using real scientific applications, to ready a future machine. “As with any bleeding edge resource, there’s testing and debugging that has to be done,” said ALCF Director of Science Katherine Riley.
In this special guest feature, James Reinders looks at Intel Xeon Phi processors from a programmer’s perspective. “How does a programmer think of Intel Xeon Phi processors? In this brief article, I will convey how I, as a programmer, think of them. In subsequent articles, I will dive a bit more into details of various programming modes, and techniques employed for some key applications. In this article, I will endeavor to not stray into deep details – but rather offer an approachable perspective on how to think about programming for Intel Xeon Phi processors.”
Nor-Tech reports that Caltech is upgrading its Nor-Tech demo cluster with Intel Xeon Phi. The demo cluster is a no-cost, no-strings opportunity for current and prospective clients to test-drive simulation applications on a cutting-edge Nor-Tech HPC equipped with Intel Xeon Phi and other high-demand platforms installed and configured. Users can also integrate their existing platforms into the demo cluster.
“Guided by the principles of interactive supercomputing, Lincoln Laboratory was responsible for a lot of the early work on machine learning and neural networks. We now have a world-class group investigating speech and video processing as well as machine language topics including theoretical foundations, algorithms and applications. In the process, we are changing the way we go about computing. Over the years we have tended to assign a specific systems to service a discrete market, audience or project. But today those once highly specialized systems are becoming increasingly heterogeneous. Users are interacting with computational resources that exhibit a high degree of autonomy. The system, not the user, decides on the computer hardware and software that will be used for the job.”
“OpenMP, Fortran 2008 and TBB are standards that can help to create parallel areas of an application. MKL could also be considered to be part of this family, because it uses OpenMP within the library. OpenMP is well known and has been used for quite some time and is continues to be enhanced. Some estimates are as high as 75 % of cycles used are for Fortran applications. Thus, in order to modernize some of the most significant number crunchers today, Fortran 2008 should be investigated. TBB is for C++ applications only, and does not require compiler modifications. An additional benefit to using OpenMP and Fortran 2008 is that these are standards, which allows code to be more portable.”
“Many supercomputer users, like the big DOE labs, are implementing these next generation systems. They are now engaged in significant code modernization efforts to adapt their key present and future applications to the new processing paradigm, and to bring their internal and external users up to speed. For some in the HPC community, this creates unanticipated challenges along with great opportunities.”
“The University of Colorado, Boulder supports researchers’ large-scale computational needs with their newly optimized high performance computing system, Summit. Summit is designed with advanced computation, network, and storage architectures to deliver accelerated results for a large range of HPC and big data applications. Summit is built on Dell EMC PowerEdge Servers, Intel Omni-Path Architecture Fabric and Intel Xeon Phi Knights Landing processors.”