As multi-socket, then multi-core systems have become the standard, the Message Passing Interface (MPI) has become one of the most popular programming models for applications that can run in parallel using many sockets and cores. Shared memory programming interfaces, such as OpenMP, have allowed developers to take advantage of systems that combine many individual servers and shared memory within the server itself. However, two different programming models have been used at the same time. The MPI 3.0 standard allows for a new MPI interprocess shared memory extension (MPI SHM).
“The Intel’s next generation Xeon Phi processor family x200 product (code-name Knights Landing) brings in new memory technology, a high bandwidth on package memory called Multi-Channel DRAM (MCDRAM) in addition to the traditional DDR4. MCDRAM is a high bandwidth (~4x more than DDR4), low capacity (up to 16GB) memory, packaged with the Knights Landing Silicon. MCDRAM can be configured as a third level cache (memory side cache) or as a distinct NUMA node (allocatable memory) or somewhere in between. With the different memory modes by which the system can be booted, it becomes very challenging from a software perspective to understand the best mode suitable for an application.”
The consensus of the panel was that making full use of Intel SSF requires system thinking at the highest level. This entails deep collaboration with the company’s application end-user customers as well as with its OEM partners, who have to design, build and support these systems at the customer site. Mark Seager commented: “For the high-end we’re going after density and (solving) the power problem to create very dense solutions that, in many cases, are water-cooled going forward. We are also asking how can we do a less dense design where cost is more of a driver.” In the latter case, lower end solutions can relinquish some scalability features while still retaining application efficiency.
Matrix multiplies can be decomposed into tiles and executed very fast on the latest generations of coprocessors. Intel has developed the hStreams library that supports task concurrency on heterogeneous platforms. The concurrency may be across nodes (Xeon, KNC, KNL-SB, KNL-LB); within a node for small matrix operations; and in the overlapping of computation and communication, particularly for tiled solutions. It relieves the user of complexity in dealing with thread affinitization, offloading, memory types, and memory affinitization.
“In GPAW, the high level nature of Python allows developers to design the algorithms, while C can be implemented for numeric intensive portions of the application through the use of highly optimized math kernels. In this application, the Python portions of the code are serial, which makes offloading to the Intel Xeon Phi coprocessor not feasible. However, and interface has been developed, pyMIC, which allows the application to launch kernels and control data transfers to the coprocessor.”
Today the ASC Student Supercomputer Challenge (ASC16) announced details from their Preliminary Contest on January 6. College students from around the world were asked to design a high performance computer system that optimizes HPCG and MASNUM_WAM applications under 3000W as well as to conduct a DNN performance optimization on a standalone hybrid CPU+MIC platform. All system designs along with the result and the code of the optimization application are to be submitted by March 2.
Many will be familiar with HPC and industrial or scientific applications, but now HPC is making its impact on something that touches the soul of millions and millions of people every day — music. In an interview with the inventor of HPC for Music, Antonis Karalis shared a brief explanation of how the future of music has been compromised and what steps are being taken to revolutionize music composition, the creative workflow, and deliver new entertainment experiences. Along the way, Karalis is applying cutting edge computing technologies including Intel Optane 3D memory and the Scalable System Framework.
“There are a number of exciting technologies we should see in 2016, and a leader will be Intel’s next-generation Xeon Phi coprocessor – a hybrid between an accelerator and general purpose processor. This new class of processors will have a large impact on the industry with its innovative design that combines a many-core architecture with general-purpose productivity. Cray, for example, will be delivering Intel Xeon Phi processors with some of our largest systems, including those going to Los Alamos National Labs (the “Trinity” supercomputer) and NERSC (the “Cori” supercomputer).”
“In the case of the Intel Xeon Phi coprocessor, although 60 cores are commonly used for computation, there is another core that is available, but not traditionally used as part of a simulation. Experiments using the 61st core for actual computation while running a reverse Monte Carlo ray tracing application for the modeling of radiative heat transfer, demonstrated that the use of another core improved performance, and that oversubscribing the coprocessor operating system thread did not degrade the performance.”
“SeqAn (www.seqan.de) is an open-source C++ template library (BSD license) that implements many efficient and generic data structures and algorithms for Next-Generation Sequencing (NGS) analysis. It contains gapped k-mer indices, enhanced suffix arrays (ESA) or an (bidirectional) FM-index, as well algorithms for fast and accurate alignment or read mapping. Based on those data types and fast I/O routines, users can easily develop tools that are extremely efficient and easy to maintain. Besides multi-core, the research team at Freie Universität Berlin has started generic support for distinguished accelerators such as Intel Xeon Phi in a new IPCC. In this talk we will introduce SeqAn and its generic design, describe successful applications that use SeqAn, and describe how SeqAn will incorporate SIMD and multicore parallelism for its core data structures using the pairwise alignment module as an example.”