In this special guest feature, Linda Barney writes that researchers at the University of Cambridge are using an Intel Xeon Phi coprocessor-based supercomputer from SGI to accelerate discovery efforts. “We have managed to modernize and optimize the main workhorse code used in the research so it now runs at 1/100-1/1000 of the original runtime. This allows us to tackle problems which would have taken unfeasibly long to solve. Secondly, it has opened windows for previously unthinkable research, namely using the MODAL code in cosmological parameter search: this is a problem which is constantly being solved in an iterative process, but adding the MODAL results to the process has only become possible with the improved performance.”
Professor Taisuke Boku from the University of Tsukuba presented this talk at the PBS User Group. “We have been operating a large scale GPU cluster HA-PACS with 332 computation nodes equipped with 1,328 GPUs managed by PBS Professional scheduler. The users are spread out across a wide variety of computational science fields with widely distributed resource sizes from single node to full-scale parallel processing. There are also several categories of user groups with paid and free scientific projects. It is a challenging operation of such a large system keeping high system utilization rate as well as keeping fairness over these user groups. We have successfully been keeping over 85%-90% of job utilization under multiple constraints.”
“NOAA will acquire software engineering support and associated tools to re-architect NOAA’s applications to run efficiently on next generation fine-grain HPC architectures. From a recent procurement document: “Finegrain architecture (FGA) is defined as: a processing unit that supports more than 60 concurrent threads in hardware (e.g. GPU or a large core-count device).”
“A parallel implementation of SpMV can be implemented, using OpenMP directives. However, by allocating memory for each core, data races can be eliminated and data locality can be exploited, leading to higher performance. Besides running on the main CPU, vectorization can be implemented on the Intel Xeon Phi coprocessor. By blocking the data in various chunks, various implementations on the Intel Xeon Phi coprocessor can be run and evaluated.”
Jack Dongarra from the University of Tennessee will keynote the first-ever Intel HPC Developer Conference, Nov. 14-15 in Austin, Texas. “The Intel HPC Developer Conference offers high levels of access. This is your best opportunity to connect with Intel architecture experts, meet HPC industry leaders, and build a lasting network of peers. This is your best opportunity to connect with Intel architecture experts, meet HPC industry leaders, and build a lasting network of peers. The conference will also offer insights into the future of HPC with Intel experts on visualization, machine learning, software tools and much more.”
One of the most used algorithms in numerical simulation is the solving of large, dense matrices. Thermal analysis, boundary element methods and electromagnetic wave calculations all depend on the ability to solve these large matrices as fast as possible. The ability to use a coprocessor such as the Intel Xeon Phi coprocessor will greatly speed up these calculations.
Today SGI and IT4Innovations national supercomputing center in the Czech Republic announced the deployment of the Salomon supercomputer. With a peak performance of 2 Petaflops, the Salomon supercomputer is twenty times more powerful than its predecessor and is the most powerful supercomputer in Europe running on the Xeon Phi coprocessors.
Through profiling, developers and users can get ideas on where an application’s hotspots are, in order to optimize certain sections of the code. In addition to locating where time is spent within an application, profiling tools can locate where there is little or no parallelism and a number of other factors that may affect performance. Performance tuning can help tremendously in many cases.
European researchers are welcome to use the world’s fastest supercomputer, the Tianhe-2, to pursue their research in collaboration with Chinese scientists and HPC specialists. “Enough Ivy Bridge Xeon E5 2692 processors had already been delivered to allow the Tianhe-2 to be upgraded from its current 55 Petaflops peak performance to the 100 Petaflops mark.”