Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Harvard Names New Lenovo HPC Cluster after Astronomer Annie Jump Cannon

Harvard has deployed a liquid-cooled supercomputer from Lenovo at it’s FASRC computing center. The system, named “Cannon” in honor of astronomer Annie Jump Cannon, is a large-scale HPC cluster supporting scientific modeling and simulation for thousands of Harvard researchers.

Assembled with the support of the Faculty of Arts and Sciences, but since branching out to serve many Harvard units, Cannon occupies more than 10,000 square feet with hundreds of racks spanning three data centers separated by 100 miles. The primary compute is housed in MGHPCC, our green (LEED Platinum) data center in Holyoke, MA. Other systems, including storage, login, virtual machines, and specialty compute, are housed in our Boston and Cambridge facilities.

This new cluster will have 30,000 cores of Intel 8268 “Cascade Lake” processors. Each node will have 48 cores and 192 GB of RAM. The interconnect is HDR 100 Gbps Infiniband (IB) connected in a single Fat Tree with 200 Gbps IB core. The entire system is water cooled which will allow us to run these processors at a much higher clock rate of ~3.4GHz. In addition to the general purpose compute resources we are also installing 16 SR670 servers each with four Nvidia V100 GPUs and 384 GB of RAM all connected by HDR IB.

Cannon is based on Lenovo SD650 NeXtScale servers with direct-to-node water-cooling for increased performance, density, ease of expansion, and controlled cooling.

Highlights:

  • Compute: The Cannon cluster is primarily comprised of 670 Lenovo SD650 NeXtScale servers, part of their new liquid-cooled Neptune line. Each chassis unit contains two nodes, each containing two Intel 8268 “Cascade Lake” processors and 192GB RAM per node. The nodes are interconnected by HDR 100 Gbps Infiniband (IB) in a single Fat Tree with a 200 Gbps IB core. The liquid cooling allows for efficient heat extraction while running higher clock speeds.
  • Storage:  FASRC now maintains over 40 PB of storage, and this keeps growing. Robust home directories are housed on enterprise-grade Isilon storage, while faster Lustre filesystems serve more performance-driven needs such as scratch and research shares. Our middle tier laboratory storage uses a mix of Lustre, Gluster and NFS filesystems.  See our storage page for more details.
  • Interconnect: Odyssey has two underlying networks: A traditional TCP/IP network and low-latency InfiniBand networks that enable high-throughput messaging for inter-node parallel-computing and fast access to Lustre mounted storage. The IP network topology connects the three data centers together and presents them as a single contiguous environment to FASRC users.
  • Software:  The core operating system is CentOS. FASRC maintains the configuration of the cluster and all related machines and services via Puppet.  Cluster job scheduling is provided by SLURM (Simple Linux Utility for Resource Management) across several shared partitions, processing approximately 29,000,000 jobs per year.

Sign up for our insideHPC Newsletter

Leave a Comment

*

Resource Links: