In this video from SC15, NERSC shares its experience on optimizing applications to run on the new Intel Xeon Phi processors (code name Knights Landing) that will empower the Cori supercomputer by the summer of 2016. “A key goal of the Cori Phase 1 system is to support the increasingly data-intensive computing needs of NERSC users. Toward this end, Phase 1 of Cori will feature more than 1,400 Intel Haswell compute nodes, each with 128 gigabytes of memory per node. The system will provide about the same sustained application performance as NERSC’s Hopper system, which will be retired later this year. The Cori interconnect will have a dragonfly topology based on the Aries interconnect, identical to NERSC’s Edison system.”
“Aurora’s revolutionary architecture features Intel’s HPC scalable system framework and 2nd generation Intel Omni-Path Fabric. The system will have a combined total of over 8 Petabytes of on package high bandwidth memory and persistent memory, connected and communicating via a high-performance system fabric to achieve landmark throughput. The nodes will be linked to a dedicated burst buffer and a high-performance parallel storage solution. A second system, named Theta, will be delivered in 2016. Theta will be based on Intel’s second-generation Xeon Phi processor and will serve as an early production system for the ALCF.”
hotchipsThe Hot Chips 2016 conference has issues its Call for Proposals. The event takes place August 21-23 in Cupertino, California. “Presentations at HOT CHIPS are in the form of 30 minute talks using PowerPoint or PDF. Presentation slides will be published in the HOT CHIPS Proceedings. Participants are not required to submit written papers, but a select group will be invited to submit a paper for inclusion in a special issue of IEEE Micro.”
Registration is now open for the inaugural Nimbix Developer Summit. With an impressive lineup of speakers & sponsors from Mellanox, migenius, Xilinx, and more, the event takes place March 15 in Dallas, Texas. “The summit agenda will feature topics such as hardware acceleration, coprocessing, photorealistic rendering, bioinformatics, and high performance analytics. The sessions will conclude with a panel of developers discussing how to overcome challenges of creating and optimizing cloud-based applications.”
The consensus of the panel was that making full use of Intel SSF requires system thinking at the highest level. This entails deep collaboration with the company’s application end-user customers as well as with its OEM partners, who have to design, build and support these systems at the customer site. Mark Seager commented: “For the high-end we’re going after density and (solving) the power problem to create very dense solutions that, in many cases, are water-cooled going forward. We are also asking how can we do a less dense design where cost is more of a driver.” In the latter case, lower end solutions can relinquish some scalability features while still retaining application efficiency.
Matrix multiplies can be decomposed into tiles and executed very fast on the latest generations of coprocessors. Intel has developed the hStreams library that supports task concurrency on heterogeneous platforms. The concurrency may be across nodes (Xeon, KNC, KNL-SB, KNL-LB); within a node for small matrix operations; and in the overlapping of computation and communication, particularly for tiled solutions. It relieves the user of complexity in dealing with thread affinitization, offloading, memory types, and memory affinitization.
“The path to Exascale computing is clearly paved with Co-Design architecture. By using a Co-Design approach, the network infrastructure becomes more intelligent, which reduces the overhead on the CPU and streamlines the process of passing data throughout the network. A smart network is the only way that HPC data centers can deal with the massive demands to scale, to deliver constant performance improvements, and to handle exponential data growth.”
A new paper from ORNL’s Sparsh Mittal presents a survey of approximate computing techniques. Recently published in ACM Computing Surveys 2016, A Survey Of Techniques for Approximate Computing reviews nearly 85 papers on this increasingly hot topic.
Today the ASC Student Supercomputer Challenge (ASC16) announced details from their Preliminary Contest on January 6. College students from around the world were asked to design a high performance computer system that optimizes HPCG and MASNUM_WAM applications under 3000W as well as to conduct a DNN performance optimization on a standalone hybrid CPU+MIC platform. All system designs along with the result and the code of the optimization application are to be submitted by March 2.
In this video from the Intel HPC Developer Conference at SC15, James Reinders hosts an Intel Black Belt discussion on Code Modernization. “Modern high performance computers are built with a combination of resources including: multi-core processors, many core processors, large caches, high speed memory, high bandwidth inter-processor communications fabric, and high speed I/O capabilities. High performance software needs to be designed to take full advantage of these wealth of resources. Whether re-architecting and/or tuning existing applications for maximum performance or architecting new applications for existing or future machines, it is critical to be aware of the interplay between programming models and the efficient use of these resources. Consider this a starting point for information regarding Code Modernization. When it comes to performance, your code matters!”