Researchers Tune HPC Codes for Intel Xeon Phi at Brookhaven Hackathon

Xinmin Tian, a senior principal engineer at Intel, gives a presentation on vector programming to help the teams optimize their scientific codes for the Xeon Phi processors.

In this video, researchers join a Hackathon at Brookhaven Lab to tune their HPC codes for the Intel Xeon Phi (Knights Landing) processor.

“The goal of this hands-on workshop was to help participants optimize their application codes to exploit the different levels of parallelism and memory hierarchies in the Xeon Phi architecture,” said CSI computational scientist Meifeng Lin, who co-organized the hackathon with CSI Director Kerstin Kleese van Dam, CSI Computer Science and Mathematics Department Head Barbara Chapman, and CSI computational scientist Martin Kong. “By the end of the hackathon, the participants had not only made their codes run more efficiently on Xeon Phi–based systems, but also learned about strategies that could be applied to other CPU-based systems to improve code performance.”

Last year, Lin was part of the committee that organized Brookhaven’s first hackathon, at which teams learned how to program their scientific applications on computing devices called graphics processing units (GPUs). As was the case for that hackathon, this one was open to any current or potential user of the hardware. In the end, five teams of three to four members each—representing Brookhaven Lab, the Institute for Mathematical Sciences in India, McGill University, Stony Brook University, University of Miami, University of Washington, and Yale University—were accepted to participate in the Intel Xeon Phi hackathon.

From February 26 through March 2, nearly 20 users of Xeon Phi–based supercomputers came together at Brookhaven Lab to be mentored by computing experts from Brookhaven and Lawrence Berkeley national labs, Indiana University, Princeton University, University of Bielefeld in Germany, and University of California–Berkeley. The hackathon organizing committee selected the mentors based on their experience in Xeon Phi optimization and shared-memory parallel programming with the OpenMP (for Multi-Processing) industry standard.

Participants did not need to have prior Xeon Phi experience to attend. Several weeks prior to the hackathon, the teams were assigned to mentors with scientific backgrounds relevant to the respective application codes. The mentors and teams then held a series of meetings to discuss the limitations of their existing codes and goals at the hackathon. In addition to their specific mentors, the teams had access to four Intel technical experts with backgrounds in programming and scientific domains. These Intel experts served as floating mentors during the event to provide expertise in hardware architecture and performance optimization.

The hackathon provided an excellent opportunity for application developers to talk and work with Intel experts directly,” said mentor Bei Wang, a HPC software engineer at Princeton University. “The result was a significant speed up in the time it takes to optimize code, thus helping application teams achieve their science goals at a faster pace. Events like this hackathon are of great value to both scientists and vendors.”

The five codes that were optimized cover a wide variety of applications:

  • A code for tracking particle-device and particle-particle interactions that has the potential to be used as the design platform for future particle accelerators
  • A code for simulating the evolution of the quark-gluon plasma (a hot, dense state of matter thought to have been present for a few millionths of a second after the Big Bang) produced through high-energy collisions at Brookhaven’s Relativistic Heavy Ion Collider (RHIC)—a DOE Office of Science User Facility
  • An algorithm for sorting records from databases, such as DNA sequences to identify inherited genetic variations and disorders
  • A code for simulating the formation of structures in the universe, particularly galaxy clusters
  • A code for simulating the interactions between quarks and gluons in real time

Large-scale numerical simulations are required to describe the matter created at the earliest times after the collision of two heavy ions,” said team member Mark Mace, a PhD candidate in the Nuclear Theory Group in the Physics and Astronomy Department at Stony Brook University and the Nuclear Theory Group in the Physics Department at Brookhaven Lab. “My team had a really successful week—we were able to make our code run much faster (20x), and this improvement is a game changer as far as the physics we can study with the resources we have. We will now be able to more accurately describe the matter created after heavy-ion collisions, study a larger array of macroscopic phenomena observed in such collisions, and make quantitative predictions for experiments at RHIC and the Large Hadron Collider in Europe.”

According to Lin, the hackathon was highly successful—all five teams improved the performance of their codes, achieving from 2x to 40x speedups.

It is expected that Intel Xeon Phi–based computing resources will continue operating until the next-generation exascale computers come online,” said Lin. “It is important that users can make these systems work to their full potential for their specific applications.”

Source: Brookhaven National Lab

Sign up for our insideHPC Newsletter