MailChimp Developer

Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:

Researching Origins of the Universe at the Stephen Hawking Centre for Theoretical Cosmology

The time for the original code running on two Intel® Xeon® processors is 2887.0 seconds; the time for the first version of the code compatible with Intel® Xeon Phi™ coprocessors is 865.9 seconds on two processors and 1991.6 seconds on one coprocessor.  The final times for the optimized code were 34.3 and 26.6 seconds for two Intel® Xeon® processors and one Intel® Xeon Phi™ coprocessor, respectively. Chart is courtesy of Dr. Juha Jäykkä, Manager of the Intel® PCC at University of Cambridge.

In this special guest feature, Linda Barney writes that researchers at the University of Cambridge are using an Intel Xeon Phi coprocessor-based supercomputer from SGI to accelerate discovery efforts. “We have managed to modernize and optimize the main workhorse code used in the research so it now runs at 1/100-1/1000 of the original runtime. This allows us to tackle problems which would have taken unfeasibly long to solve. Secondly, it has opened windows for previously unthinkable research, namely using the MODAL code in cosmological parameter search: this is a problem which is constantly being solved in an iterative process, but adding the MODAL results to the process has only become possible with the improved performance.”

Case Study: PBS Pro on a Large Scale Scientific GPU Cluster


Professor Taisuke Boku from the University of Tsukuba presented this talk at the PBS User Group. “We have been operating a large scale GPU cluster HA-PACS with 332 computation nodes equipped with 1,328 GPUs managed by PBS Professional scheduler. The users are spread out across a wide variety of computational science fields with widely distributed resource sizes from single node to full-scale parallel processing. There are also several categories of user groups with paid and free scientific projects. It is a challenging operation of such a large system keeping high system utilization rate as well as keeping fairness over these user groups. We have successfully been keeping over 85%-90% of job utilization under multiple constraints.”

Video: NOAA Software Engineering for Novel Architectures (SENA) Project


“NOAA will acquire software engineering support and associated tools to re-architect NOAA’s applications to run efficiently on next generation fine-grain HPC architectures. From a recent procurement document: “Finegrain architecture (FGA) is defined as: a processing unit that supports more than 60 concurrent threads in hardware (e.g. GPU or a large core-count device).”

Sparse Matrix Multiplication


“A parallel implementation of SpMV can be implemented, using OpenMP directives. However, by allocating memory for each core, data races can be eliminated and data locality can be exploited, leading to higher performance. Besides running on the main CPU, vectorization can be implemented on the Intel Xeon Phi coprocessor. By blocking the data in various chunks, various implementations on the Intel Xeon Phi coprocessor can be run and evaluated.”

Jack Dongarra to Keynote Intel HPC Developer Conference at SC15

Jack Dongarra, University of Tennessee

Jack Dongarra from the University of Tennessee will keynote the first-ever Intel HPC Developer Conference, Nov. 14-15 in Austin, Texas. “The Intel HPC Developer Conference offers high levels of access. This is your best opportunity to connect with Intel architecture experts, meet HPC industry leaders, and build a lasting network of peers. This is your best opportunity to connect with Intel architecture experts, meet HPC industry leaders, and build a lasting network of peers. The conference will also offer insights into the future of HPC with Intel experts on visualization, machine learning, software tools and much more.”

Out of Core Solvers on a Cluster


One of the most used algorithms in numerical simulation is the solving of large, dense matrices. Thermal analysis, boundary element methods and electromagnetic wave calculations all depend on the ability to solve these large matrices as fast as possible. The ability to use a coprocessor such as the Intel Xeon Phi coprocessor will greatly speed up these calculations.

Czech Republic Steps Up with 2 Petaflop SGI ICE X Supercomputer


Today SGI and IT4Innovations national supercomputing center in the Czech Republic announced the deployment of the Salomon supercomputer. With a peak performance of 2 Petaflops, the Salomon supercomputer is twenty times more powerful than its predecessor and is the most powerful supercomputer in Europe running on the Xeon Phi coprocessors.

Optimization Through Profiling


Through profiling, developers and users can get ideas on where an application’s hotspots are, in order to optimize certain sections of the code. In addition to locating where time is spent within an application, profiling tools can locate where there is little or no parallelism and a number of other factors that may affect performance. Performance tuning can help tremendously in many cases.

How the QPACE 2 Supercomputer is Solving Quantum Physics with Intel Xeon Phi

QPACE 2 prototype at the University of Regensburg (Image courtesy of Tilo Wettig)

In this special guest feature from Scientific Computing World, Tilo Wettig from the University of Regensburg in Germany describes the unusual design of a supercomputer dedicated to solving some of the most arcane issues in quantum physics.

An Open Invitation to Work on the Tianhe-2 Supercomputer

Dr. Yutong Lu, NUDT

European researchers are welcome to use the world’s fastest supercomputer, the Tianhe-2, to pursue their research in collaboration with Chinese scientists and HPC specialists. “Enough Ivy Bridge Xeon E5 2692 processors had already been delivered to allow the Tianhe-2 to be upgraded from its current 55 Petaflops peak performance to the 100 Petaflops mark.”