Today the Barcelona Supercomputing Center announced plans to MareNostrum 4, a 13.7 Petaflop supercomputer that will be 12.4 times more powerful than the current MareNostrum 3 system. In a contract valued at almost €30 million, IBM will integrate in one sole machine using its own technologies alongside those of Lenovo, Intel, and Fujitsu.
Over at the ARM Connected Community, Darren Cepulis, writes that the popular chip platform is now part of the OpenHPC community. As one of a series of strategic moves, the effort should help bolster ARM as a platform for high performance computing.
Yutaka Ishikawa from Riken AICS presented this talk at the HPC User Forum. “Slated for delivery sometime around 2022, the ARM-based Post-K Computer has a performance target of being 100 times faster than the original K computer within a power envelope that will only be 3-4 times that of its predecessor. RIKEN AICS has been appointed as the main organization for leading the development of the Post-K.”
This week at the Hot Chips conference, Phytium Technology from China unveiled a 64-core CPU and a related prototype computer server. “Phytium says the new CPU chip, with 64-bit arithmetic compatible with ARMv8 instructions, is able to perform 512 GFLOPS at base frequency of 2.0 GHz and on 100 watts of power dissipation.”
The Fujitsu Journal has posted details on a recent Hot Chips presentation by Toshio Yoshida about the instruction set architecture (ISA) of the Post-K processor. “The Post-K processor employs the ARM ISA, developed by ARM Ltd., with enhancements for supercomputer use. Meanwhile, Fujitsu has been developing the microarchitecture of the processor. In Fujitsu’s presentation, we also explained that our development of mainframe processors and UNIX server SPARC processors will continue into the future. The reason that Fujitsu is able to continuously develop multiple processors is our shared microarchitecture approach to processor development.”
Over at the ARM Community Blog, Nigel Stephens writes that the company has introduced scalable vector extensions (SVE) their A64 instruction set to bolster high performance computing. Fujitsu is developing a new HPC processor conforming to ARMv8-A with SVE for the Post-K computer.
ARM processors will provide the computational muscle behind one of the most powerful supercomputers in the world, replacing the current K computer at the RIKEN Advanced Institute for Computational Science (AICS) in Japan. During the ISC conference, Fujitsu released details of the new system during a presentation with Fujitsu vice president Toshiyuki Shimizu. Shimizu stated that the “post K” system, which is set to go live in 2020, will have 100 times more application performance than the K supercomputer.
Now that ARM has been acquired, the big question is how much the Softbank investment firm will invest in bolstering their chips for HPC. Meanwhile, ARM continues to gain traction as evidenced by
today’s announcement that a paper on the ARM-based Mont-Blanc Project has been selected as a Best Paper Finalist for SC16. Entitled “The Mont-Blanc prototype: An Alternative Approach for HPC Systems,” the paper was written by Nikola Rajovic, a BSC researcher involved in the Mont-Blanc project since its beginnings.
Today Cavium announced ThunderX2, its second generation of Workload-Optimized ARM server SoCs. ThunderX2 targets high performance volume servers deployed by Public/Private Cloud and Telco data centers and high performance computing applications. “Optimized for key Data Center workloads, ThunderX2 will deliver comparable performance at a better total cost of ownership compared to the next generation of traditional server processors.”
“Unified Communication X (UCX) is a set of network APIs and their implementations for high performance computing. UCX comes from the combined efforts of national laboratories, industry, and academia to co-design and implement a high-performing and highly scalable communication APIs for next generation applications and systems. UCX solves the problem of moving data memory location “A” to memory location “B” considering across multiple type of memories (DRAM, accelerator memories, etc.) and multiple transports (e.g. InfiniBand, uGNI, Shared Memory, CUDA, etc. ), while minimizing latency, and maximizing bandwidth and message rate.”