Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Researchers Tune HPC Codes for Intel Xeon Phi at Brookhaven Hackathon

“The goal of this hands-on workshop was to help participants optimize their application codes to exploit the different levels of parallelism and memory hierarchies in the Xeon Phi architecture,” said CSI computational scientist Meifeng Lin. “By the end of the hackathon, the participants had not only made their codes run more efficiently on Xeon Phi–based systems, but also learned about strategies that could be applied to other CPU-based systems to improve code performance.”

Intel AVX Gives Numerical Computations in Java a Big Boost

Recent Intel® enhancements to Java enable faster and better numerical computing. In particular, the Java Virtual Machine (JVM) now uses the Fused Multiply Add (FMA) instructions on Intel Intel Xeon® PhiTM processors with Advanced Vector Instructions (Intel AVX) to implement the Open JDK9 Math.fma()API. This gives significant performance improvements for matrix multiplications, the most basic computation found in most HPC, Machine Learning, and AI applications.

XSEDE offers free HPC Training from Cornell Virtual Workshop

Today Cornell University announced that four new Cornell Virtual Workshop training topics are available at the Extreme Science and Engineering Discovery Environment (XSEDE) user portal. “The Cornell University Center for Advanced Computing (CAC) is a leader in the development and deployment of Web-based training programs designed to enhance the computational skills of researchers, broaden the participation of underrepresented groups in the sciences and engineering, and accelerate the adoption of new and emerging technologies.”

Performance Insights Using the Intel Advisor Python API

Tuning a complex application for today’s heterogeneous platforms requires an understanding of the application itself as well as familiarity with tools that are available for assisting with analyzing where in the code itself to look for bottlenecks.  The process for optimizing the performance of an application, in general, requires the following steps that are most likely applicable for a wide range of applications.

Call for Papers: International Workshop on Accelerators and Hybrid Exascale Systems

The eight annual  International Workshop on Accelerators and Hybrid Exascale Systems (AsHES) has issued its Call for Papers. Held in conjunction with the 32nd IEEE International Parallel and Distributed Processing Symposium, the AsHES Workshop takes place May 23 in Vancouver, Canada. “This workshop focuses on understanding the implications of accelerators and heterogeneous designs on the hardware systems, porting applications, performing compiler optimizations, and developing programming environments for current and emerging systems. It seeks to ground accelerator research through studies of application kernels or whole applications on such systems, as well as tools and libraries that improve the performance and productivity of applications on these systems.”

Trinity Supercomputer lands at #7 on TOP500

The Trinity Supercomputer at Los Alamos National Laboratory was recently named as a top 10 supercomputer on two lists: it made number three on the High Performance Conjugate Gradients (HPCG) Benchmark project, and is number seven on the TOP500 list. “Trinity has already made unique contributions to important national security challenges, and we look forward to Trinity having a long tenure as one of the most powerful supercomputers in the world.” said John Sarrao, associate director for Theory, Simulation and Computation at Los Alamos.

Supermicro Booth Tour Showcases HPC Innovation at SC17

In this video, Akira Sano provides a tour of the Supermicro booth at SC17 in Denver. “Supermicro offers the best selection of leading HPC optimized servers in the industry as evidenced by the recent selection of our twin architecture by the NASA Center for Climate Simulation (NCCS).”

Podcast: Optimizing Cosmos Code on Intel Xeon Phi

In this TACC podcast, Cosmos code developer Chris Fragile joins host Jorge Salazar for a discussion on how researchers are using supercomputers to simulate the inner workings of Black holes. “For this simulation, the manycore architecture of KNL presents new challenges for researchers trying to get the best compute performance. This is a computer chip that has lots of cores compared to some of the other chips one might have interacted with on other systems,” McDougall explained. “More attention needs to be paid to the design of software to run effectively on those types of chips.”

1000x Faster Deep-Learning at Petascale Using Intel Xeon Phi Processors

A cumulative effort over several years to scale the training of deep-learning neural networks has resulted in the first demonstration of petascale deep-learning training performance, and further to deliver this performance when solving real science problems. The result reflects the combined efforts of NERSC (National Energy Research Scientific Computing Center), Stanford and Intel to solve real world use cases rather than simply report on performance benchmarks.

Intel Parallel Studio XE 2018 For Demanding HPC Applications

“For those that develop HPC applications, there are usually two main areas that must be considered. The first is the translation of the algorithm, whether simulation based, physics based or pure research into the code that a modern computer system can run. A second challenge is how to move from the implementation of an algorithm to the performance that takes advantage of modern CPUs and accelerators.”