This week at the Hot Chips conference, Phytium Technology from China unveiled a 64-core CPU and a related prototype computer server. “Phytium says the new CPU chip, with 64-bit arithmetic compatible with ARMv8 instructions, is able to perform 512 GFLOPS at base frequency of 2.0 GHz and on 100 watts of power dissipation.”
The Fujitsu Journal has posted details on a recent Hot Chips presentation by Toshio Yoshida about the instruction set architecture (ISA) of the Post-K processor. “The Post-K processor employs the ARM ISA, developed by ARM Ltd., with enhancements for supercomputer use. Meanwhile, Fujitsu has been developing the microarchitecture of the processor. In Fujitsu’s presentation, we also explained that our development of mainframe processors and UNIX server SPARC processors will continue into the future. The reason that Fujitsu is able to continuously develop multiple processors is our shared microarchitecture approach to processor development.”
Over at the ARM Community Blog, Nigel Stephens writes that the company has introduced scalable vector extensions (SVE) their A64 instruction set to bolster high performance computing. Fujitsu is developing a new HPC processor conforming to ARMv8-A with SVE for the Post-K computer.
ARM processors will provide the computational muscle behind one of the most powerful supercomputers in the world, replacing the current K computer at the RIKEN Advanced Institute for Computational Science (AICS) in Japan. During the ISC conference, Fujitsu released details of the new system during a presentation with Fujitsu vice president Toshiyuki Shimizu. Shimizu stated that the “post K” system, which is set to go live in 2020, will have 100 times more application performance than the K supercomputer.
Now that ARM has been acquired, the big question is how much the Softbank investment firm will invest in bolstering their chips for HPC. Meanwhile, ARM continues to gain traction as evidenced by
today’s announcement that a paper on the ARM-based Mont-Blanc Project has been selected as a Best Paper Finalist for SC16. Entitled “The Mont-Blanc prototype: An Alternative Approach for HPC Systems,” the paper was written by Nikola Rajovic, a BSC researcher involved in the Mont-Blanc project since its beginnings.
Today Cavium announced ThunderX2, its second generation of Workload-Optimized ARM server SoCs. ThunderX2 targets high performance volume servers deployed by Public/Private Cloud and Telco data centers and high performance computing applications. “Optimized for key Data Center workloads, ThunderX2 will deliver comparable performance at a better total cost of ownership compared to the next generation of traditional server processors.”
“Unified Communication X (UCX) is a set of network APIs and their implementations for high performance computing. UCX comes from the combined efforts of national laboratories, industry, and academia to co-design and implement a high-performing and highly scalable communication APIs for next generation applications and systems. UCX solves the problem of moving data memory location “A” to memory location “B” considering across multiple type of memories (DRAM, accelerator memories, etc.) and multiple transports (e.g. InfiniBand, uGNI, Shared Memory, CUDA, etc. ), while minimizing latency, and maximizing bandwidth and message rate.”
Today ThinkParQ announced that the complete BeeGFS parallel file system is now available as open source. Developed specifically for performance-critical environments, the BeeGFS parallel file system was developed with a strong focus on easy installation and high flexibility, including converged setups where storage servers are also used for compute jobs. By increasing the number of servers and disks in the system, performance and capacity of the file system can simply be scaled out to the desired level, seamlessly from small clusters up to enterprise-class systems with thousands of nodes.
In this video from the 2015 Hot Chips Conference, Charles Zhang from Phytium presents: Mars – A 64-Core ARMv8 Processor. Formed in China in 2012, Phytium is a unique technology provider of HPC servers, focusing mainly on high performance general microprocessor, accelerator chip, reference board design and various servers design from blade, cluster, standard stack to HPC Server. “Optimized for HPC, the Mars chip features eight panels, each with eight “Xiaomi” cores. The panels share an L2 cache of 32 MB, two Directory Control Units and a routing cell for the internal mesh.”
Today Centerprise International (Ci) in the UK announced a collaboration with E4 Computer Engineering to develop next-generation datacenter technologies for HPC. “This is an exciting development for both companies, as it combines the specialist knowledge of E4 in the field of high performance computing with our considerable experience in building quality, customized hardware solutions and our expansive reach in the UK IT channel,” said Jeremy Nash, Centerprise Sales Director.”