Immersion Cooling for Transportable HPC

liquid cooling

Braden Cooper, Product Marketing Manager, One Stop Systems

The latest high-performance computing systems for AI (Artificial Intelligence) applications generate more heat than ever before. Datacenters have begun adoption of immersion cooling solutions that immerse the temperature-sensitive electronics in a non-conductive fluid which efficiently dissipates the heat. In parallel, many AI edge applications are transitioning from low-performing embedded systems to solutions which incorporate more advanced enterprise compute hardware. To solve both the thermal and structural challenges of the rugged edge, system integrators look to immersion cooling technology to meet their environmental specifications.

The NVIDIA SXM form factor GPUs used in HGX platforms, which dissipated up to 500W in the A100, have increased to 700W per GPU in the H100. Platforms integrating the HGX H100 4-GPU or the HGX H100 8-GPU backplanes must dissipate an additional 800-1600W of heat compared to existing A100 based platforms. The heat generated by these devices has introduced a thermal dissipation requirement beyond what the existing industry is equipped to handle. In rugged edge environments, AI compute integrators seek to leverage the advances in datacenter liquid cooling while solving the complexities of environmental parameters for the target application.

The characterization of a target environment can vary depending on the location of system integration. For example, an autonomous trucking vehicle requiring high-performance AI computing may see temperatures ranging from below freezing in winter months to extreme heat conditions – exacerbated by local climates. The trucks will also experience the rigors of road travel including vibration, shock, and humidity conditions. Meanwhile, integrating a high-performance solution on an aircraft may require a system with a more extreme operational temperature range, which mitigates the impact of altitude and lower air-density on cooling. A common theme to the rugged design of systems in these AI Transportable applications is the need for a robust cooling strategy designed to support a wide temperature range and alleviate mechanical stress from external vibration or shock loads.

Immersion cooling is a strong candidate to solve both the thermal and structural elements of these applications. From a thermal standpoint, liquid immersion cooling (either single-phase or two-phase) offers the greatest thermal efficiency of any liquid cooling method. What this efficiency means is that liquid immersed systems can operate at a higher external ambient temperature than systems with a direct-to-chip liquid cooling implementation, and significantly higher ambient temperatures than systems with air-cooling (forced or natural convection). On the structural side, the immersion fluid itself acts as a dampening mechanism – mitigating the impact of vibration forces on the electronics.

OSS Rigel Edge Supercomputer Two-Phase Immersion System

The two methods of immersion cooling, single-phase and two-phase, each have pros and cons when implemented in edge environments. Single-phase immersion cooling uses a fluid which maintains its liquid state across the entire target temperature range. This cooling method works similarly to air cooling, in that the fluid is typically directed across a heatsink attached to a heat dissipating surface. The warm fluid is then pumped to an external heat exchanger, which then recirculates the cooled fluid back into the system. By using the specialized immersion fluid, this method cools more efficiently than direct-to-chip cooling with a lower delta between ambient and maximum fluid temperatures. Two-phase cooling by comparison makes use of the natural heat dissipation properties of the evaporation cycle of the fluid. In this method, a fluid is selected which has a boiling point below the maximum operating temperature of key heat dissipating components. Once the fluid around the components reaches its boiling point, it turns from a liquid to gaseous state, pulling the heat away from the hot components. The gas is then cooled via contact with a condensing coil until it condenses and is recirculated back into the system. Two-phase cooling is typically more efficient than single-phase, but does add complexities, as the system must be sufficiently sealed to prevent the fluid in a gaseous state from escaping. Additionally, altitude can change the boiling point property of the fluid, requiring either strict altitude limits or a pressurized system to maintain fluid properties.

Immersion technology has already made a significant impact on datacenter high-performance computing scale out – providing an efficient upgrade path to support the next generation of AI computing hardware. Adapting these technologies to rugged edge environments is inevitable as thermal dissipation requirements continue to grow. As AI deployments become more common in autonomous vehicles and other edge domains, system integrators will look to immersion technology as a strong candidate to solve the thermal and structural challenges of the environments.