Argonne National Laboratory is seeking a Postdoctoral Appointee on FPGAs for Supercomputing in our Job of the Week. “This is an exciting opportunity for you to contribute to a new way of thinking in high-performance computing (HPC) by marrying state-of-the-art reconfigurable hardware with modern performance-portable programming models. This research will combine advances in high-level synthesis for field-programmable gate arrays (FPGAs) with the emerging OpenMP 4 programming model, thus enabling existing HPC codes to take advantage of the advanced floating-point support available in modern FPGA designs.”
“While new technology will be developed that reduces the power per operation needed, in today’s environments it is important to understand how an application affects power usage. For modern applications that have been optimized to take advantage of both the Intel Xeon CPU and the Intel Xeon Phi coprocessor, the hardware mentioned does include various power states, which can minimize the power consumption when idle.”
Through the microarchitecture improvements, increased core counts, and faster memory speeds of the new Intel Xeon processor E5-2600 v4 product family based on the “Broadwell” microarchitecture, you can increase your HPC application performance. You will see significantly improved per-core performance with these just announced Intel® Xeon® processors that can then be multiplied by parallel programs that utilize the number of cores available inside these processors. Improvements to the memory and virtual memory capabilities – including the ability to utilize faster DDR4-2400 memory – means that these processors can speed all aspects of your application from IO DMA operations, to processing serial sections of code, as well as delivering increased performance on both task- and data-parallel applications.
Kenneth Hoste from the University Ghent presented this tutorial at the Switzerland HPC Conference. “One unnecessarily time-consuming task for HPC user support teams is installing software for users. Due to the advanced nature of a supercomputing system (think: multiple multi-core modern microprocessors (possibly next to co-processors like GPUs), the availability of a high performance network interconnect, bleeding edge compilers & libraries, etc.), compiling the software from source on the actual operating system and system architecture that it is going to be used on is typically highly preferred over using readily available binary packages that were built in a generic way.
Funded by the European Commission in 2011, the DEEP project was the brainchild of scientists and researchers at the Jülich Supercomputing Centre (JSC) in Germany. The basic idea is to overcome the limitations of standard HPC systems by building a new type of heterogeneous architecture. One that could dynamically divide less parallel and highly parallel parts of a workload between a general-purpose Cluster and a Booster—an autonomous cluster with Intel® Xeon Phi™ processors designed to dramatically improve performance of highly parallel code.
“It is important to be able to express algorithms and then the coding in an architecture independent manner to gain maximum portability. Vectorization, using the available CPUs and coprocessors such as the Intel Xeon Phi coprocessor, are critical for HPC applications where performance is of the highest importance. However, since architectures change over time and become more powerful, using libraries that can adjust to the new architectures is quite important.”
Expected later in 2016, Intel will be releasing production versions of its Knights Landing (KNL) 72-core coprocessor. These next generation coprocessors are impacting the physical design of the supercomputers now coming down the pike in a number of ways. One of the most dramatic changes is the significant increase in cooling requirements – these are high wattage chips that run very hot and present some interesting engineering challenges for systems designers.
Even though it’s a new generation fabric, Intel OPA is still backwards compatible with the many applications in the HPC community that were written using the OpenFabrics Alliance* software stack for InfiniBand. So, existing InfiniBand users will be able to run their codes that are based on the OpenFabrics Enterprise Distribution (OFED) software on Intel OPA. Additionally, Intel has open sourced the key software elements of their fabric to allow integration of Intel OPA into the OFED stack, which several Linux* distributions include in their packages.
Threading plus vectorization together can increase the performance of an application more than one technique or the other. Threading and vectorizing an application are two techniques that are known to increase the performance of an application using modern CPUs and coprocessors. However, a deep understanding of the application is needed in order to make the decisions needed and to rewrite portions of the application to take advantage of these techniques. In cases where the developer might not be familiar with the code an automated tools such as the Intel Vectorization Advisor can assist the developer.
“Vector instruction sets have progressed over time, and it important to use the most appropriate vector instruction set when running on specific hardware. The OpenMP SIMD directive allows the developer to explicitly tell the compiler to vectorize a loop. In this case, human intervention will override the compilers sense of dependencies, but that is OK if the developer knows their application well.”