Matrix multiplies can be decomposed into tiles and executed very fast on the latest generations of coprocessors. Intel has developed the hStreams library that supports task concurrency on heterogeneous platforms. The concurrency may be across nodes (Xeon, KNC, KNL-SB, KNL-LB); within a node for small matrix operations; and in the overlapping of computation and communication, particularly for tiled solutions. It relieves the user of complexity in dealing with thread affinitization, offloading, memory types, and memory affinitization.
“What we’re showcasing this year is – what we’re jokingly calling – face-melting performance. What we’re trying to do is make extreme performance available at a very aggressive price point, and at a very aggressive space point, for end users. So, what we’ve been doing and what we’ve been working on for the past couple of months has been, basically, building an NVMe-type unit. This NVMe unit connects flash devices through a PCIe interface to the processor complex.”
Bull Atos has installed the fastest supercomputer in Croatia at the University in Rijeka. With an expected debut on the TOP500 in November, the 239 Teraflop BURA supercomputer will be used by university researchers in areas including drug discovery and genomics.
“Modern Numerical Weather prediction (NWP) can now use many thousands of cores in a single run of the application. By using modern CPUs such as the Intel Xeon processors and the Intel Xeon Phi coprocessors, tremendous performance and efficiency can be obtained. It is important to remember that many of the applications are written in Fortran and many of the contributors are domain experts, not parallel programming gurus.”
Today Norway’s Dolphin Interconnect Solutions demonstrated record a low latency of 300 nanoseconds at IDF 2015. Dolphin achieved this record by adding Intel Xeon Non Transparent Bridging (NTB) support to its existing PCI Express network product. In addition, Dolphin announced a new PCIe 3.0 host adapter, the PXH810 Host Adapter, which achieves 540 nanoseconds of latency at 64Gbps wire speeds.
Today Colfax International announced free online workshops on parallel programming and optimization for Intel architecture, including Intel Xeon processors and Intel Xeon Phi coprocessors. “The Hands-on Workshop (HOW) series will introduce best practices to researchers and developers to efficiently extract maximum performance out of modern parallel processors, achieving shorter time to solution, higher research productivity, and future-proof design.”
NASA reports that it’s newly upgraded Pleiades supercomputer ranks number 11 on the July 2015 TOP500 list of the most powerful supercomputers. And while the LINPACK computing power of Pleiades jumped nearly 21 percent, its ranking at number 5 on the new HPCG benchmark list reflects its ability to tackle real world applications.
In this special guest feature, John Kirkley writes that Intel is using its new Omni-Path Architecture as a foundation for supercomputing systems that will scale to 200 Petaflops and beyond. “With its ability to scale to tens and eventually hundreds of thousands of nodes, the Intel Omni-Path Architecture is designed for tomorrow’s HPC workloads. The platform has its sights set squarely on Exascale performance while supporting more modest, but still demanding, future HPC implementations.”
On Monday, the Leibniz Supercomputing Centre (LRZ) celebrated the expansion of their SuperMUC cluster. Now in production mode, the 6.8 Petaflop “Phase 2″ supercomputer is powered by over 241,000 Intel processor cores.
Today Idaho National Laboratory (INL) announced that the lab has deployed an SGI ICE X supercomputer to power nuclear reactor simulations. Supplied through SGI’s partner ComnetCo, the 511 Teraflop SGI ICE X cluster comprises 611 water-cooled nodes using Intel Xeon E5-2600 v3 processors.