The use of vector instructions can speed up applications tremendously when used correctly. The benefit is that much more work can be done in a clock cycle than by performing the operation one at a time. The Intel Xeon Phi coprocessor was designed with strong support for vector level parallelism. “When these techniques are used either individually or in combination in different areas of the application, the performance will surely be increased, in many cases without a lot of effort.”
Pat McGarry from Ryft presented this talk at the HPC User Forum in Tucson. “Years in the making, the Ryft ONE combines two proven innovations in hardware and software to optimize compute, storage and I/O performance: the Ryft Hybrid FPGA/x86 Compute Platform, which leverages a massively parallel bitwise computing architecture and the Ryft Algorithmic Primitives (RAP) Library.
In this podcast, the Radio Free HPC team looks at the news highlights for the week leading up to Friday the 13th of May, 2016. Highlights include a 25 Petaflop Fujitsu supercomputer coming to Japan, an OpenPOWER Summit coming to Europe, and fighting the Zombie Apocalypse with HPC.
“The next step is to look at using OpenMP directives to create multiple threads to distribute the work over many threads and cores. A key OpenMP directive, #pragma omp for collapse, will collapse the inner two loops into one. The developer can then set the number of threads and cores to use and return the application to determine the performance. In one test case, three threads per physical core shows the best performance, by quite a lot compared to just using one or two threads per core.”
Today Fujitsu announced an order for a 25 Petaflop supercomputer system from the University of Tokyo and the University of Tsukuba. Powered by Intel Knights Landing processors, the “T2K Open Supercomputer” will be deployed at the Joint Center for Advanced High-Performance Computing (JCAHPC), which the two universities jointly operate. “The new supercomputer will be an x86 cluster system consisting of 8,208 of the latest FUJITSU Server PRIMERGY x86 servers running next-generation Intel Xeon Phi processors. Due to be completely operational in December 2016, the system is expected to be Japan’s highest-performance supercomputer.”
“The Intel Xeon Phi coprocessor is an example of a many core system that can greatly increase the performance of an application when used correctly. Simply taking a serial application and expecting tremendous performance gains will not happen. Rewriting parts of the application will be necessary to take advantage of the architecture of the Intel Xeon Phi coprocessor.”
“Parallel software and parallel hardware, used together will give the best results for an application. If the application is serial in nature, and the processor is serial, then there will obviously not be a great gain in performance. When the application is parallelized, but the processor is serial, again, no great gain. A third combination is when the application is serial and the processing is parallel. Since the application cannot take advantage of the increased power of the hardware, there will not be a great performance boost. The best and really only solution is to modify the application to run in parallel, using high performing parallel hardware.”
“As clock speeds for CPU’s have not been increasing as compared to a decade ago, chip designers have been enhancing the performance of both CPUs, such as the Intel Xeon and the Intel Xeon Phi coprocessor by adding more cores. New designs allow for applications to perform more work in parallel, reducing the overall time to perform a simulation, for example. However, to get this increase in performance, applications must be designed or re-worked to take advantage of these new designs which can include hundreds to thousands of cores in a single computer system.”
A research team at the Ohio Supercomputer Center (OSC) is beginning the task of modernizing a computer software package that leverages large-scale, 3-D modeling to research fatigue and fracture analyses, primarily in metals. “The research is a result of OSC being selected as an Intel Parallel Computing Center. The Intel PCC program provides funding to universities, institutions and research labs to modernize key community codes used across a wide range of disciplines to run on current state-of-the-art parallel architectures. The primary focus is to modernize applications to increase parallelism and scalability through optimizations that leverage cores, caches, threads and vector capabilities of microprocessors and coprocessors.”
Argonne National Laboratory is seeking a Postdoctoral Appointee on FPGAs for Supercomputing in our Job of the Week. “This is an exciting opportunity for you to contribute to a new way of thinking in high-performance computing (HPC) by marrying state-of-the-art reconfigurable hardware with modern performance-portable programming models. This research will combine advances in high-level synthesis for field-programmable gate arrays (FPGAs) with the emerging OpenMP 4 programming model, thus enabling existing HPC codes to take advantage of the advanced floating-point support available in modern FPGA designs.”