NREL Report Looks at Aquila’s Cold Plate Cooling System for HPC

Print Friendly, PDF & Email

Liquid cooling has long been an enabling technology for high performance computing, but cost, complexity, and facility requirements continue to be concerns. Enter the Aquila Group, whose cold-plate technology looks to offer the advantages of liquid cooling with less risk. Based on a 10-month trial, a new paper gives an overview of the Aquarius fixed cold plate cooling technology and provides results from early energy performance evaluation testing.

In the first half of 2018, as part of a partnership with Sandia National Laboratories, Aquila installed its fixed cold plate, liquid-cooled Aquarius rack solution for HPC clustering at the National Renewable Energy Laboratory’s (NREL’s) Energy Systems Integration Facility. This new fixed cold plate, warm-water cooling technology together with a manifold design provides easy access to service nodes and eliminates the need for server auxiliary fans altogether. Aquila and Sandia National Laboratories chose NREL’s HPC Data Center for the initial installation and evaluation because the data center is configured for liquid cooling and has the required instrumentation to measure flow and temperature differences to facilitate testing.

Sandia’s Aquila-based HPC cluster was named “Yacumama” and was configured to operate independently from all other HPC systems in the ESIF data center. There are 36 compute nodes that have INTEL S2600KP motherboards. The motherboards are configured with dual X86_64 XEON central processing units, 128 GB of random access memory (RAM), 128 GB solid-state drive (SSD), and an Omni-Path adapter. The supplied configuration is capable of providing >40 teraflops of LINPACK performance while drawing less than 15 kW of power.

In building the data center, NREL’s vision was to create a showcase facility that demonstrates best practices in data center sustainability and serves as an exemplar for the community. The innovation was realized by adopting a holistic “chips to bricks” approach to the data center, focusing on three critical aspects of data center sustainability:

  • Efficiently cool the information technology equipment using direct, component-level liquid cooling with a power usage effectiveness design target of 1.06 or better;
  • Capture and reuse the waste heat produced; and
  • Minimize the water used as part of the cooling process. There is no compressor-based cooling system for NREL’s HPC data center. Cooling liquid is supplied indirectly from cooling towers.

In this video, engineers describe the advantages of Aquila Aquarius Liquid Cooling for HPC.

The Yacumama cluster installation was straightforward and easily integrated directly into the data center’s existing hydronic system. A round of discovery testing was conducted to identify the range of reasonable supply temperatures to the fixed cold plates and the impact of adjusting facility flow. Then LINPACK tests at 100% duty cycle were run for 48 hours. Results are provided in Section 3, and the key takeaway is that this fixed cold plate design provides a very high percentage of heat capture direct to water—up to 98.3% when evaluating compute nodes only (the percentage drops to 93.4% when evaluating compute nodes along with the Powershelf for the system because the Powershelf is not direct liquid cooled).

This cluster has been in operation for nearly 10 months, requiring zero maintenance, and no water leaks were observed. The Yacumama system will be returned to service at Sandia’s recently completed warm-water-cooled HPC data center in early 2019.

Download the report (PDF)

Sign up for our insideHPC Newsletter