INFRASTRUCTURE

Liquid Cooling for AI Datacenters

Summary • 12 Data Sources

Why is liquid cooling essential for AI datacenters?

H100 GPUs draw 700W+ per chip, making air cooling insufficient for high-density AI clusters. Liquid cooling enables: 80-100 kW/rack densities (vs 10-15 kW air-cooled), PUE of 1.05-1.15 (vs 1.3-1.5 air), and 30-40% reduction in cooling energy. Direct-to-chip is the mainstream solution; immersion offers benefits for extreme density but requires infrastructure changes.

Key Data Points

Rack Density: 80-100 kW (Liquid) vs 10-15 kW (Air)
PUE Efficiency: 1.05-1.15 vs 1.3-1.5
Energy Reduction: 30-40% cooling energy savings
Mainstream Tech: Direct-to-Chip (D2C) Cold Plates
Emerging Tech: Two-phase Immersion Cooling

Calculate Water Usage →Compare Colo Pricing

Why AI Datacenters Need Liquid Cooling

Heat Density

An 8-GPU DGX H100 produces 10.2 kW of heat. A single 42U rack can hold 4+ systems = 40+ kW/rack. Air cooling is limited to ~15 kW/rack.

Energy Efficiency

Liquid is 3,500x more efficient at heat transfer than air. This translates to PUE of 1.05-1.15 vs 1.3-1.5 for air, reducing total power consumption by 15-25%.

Density Economics

Higher density = less floor space, shorter cable runs, reduced facility costs. A liquid-cooled cluster may need 50% less space than air-cooled equivalent.

Liquid Cooling Technologies

Direct-to-Chip (D2C)

Cold plates attached directly to GPU/CPU, circulating liquid through rack-level CDUs.

Heat Removal60-80% of total

Rack Density60-100 kW/rack

PUE Impact1.05-1.15

Capex Premium15-25% over air

MaturityProduction-ready

Vendors: CoolIT, Asetek, Vertiv, Zutacore

Immersion Cooling

Entire servers submerged in dielectric fluid, removing 100% of heat.

Heat Removal100% of total

Rack Density100-250+ kW/tank

PUE Impact1.02-1.08

Capex Premium40-60% over air

MaturityEmerging

Vendors: GRC, LiquidCool, Submer, Iceotope

Cooling Technology Comparison

Factor	Air Cooling	Direct-to-Chip	Immersion
Max Rack Density	10-15 kW	60-100 kW	100-250+ kW
PUE	1.3-1.5	1.05-1.15	1.02-1.08
Capex ($/kW)	$150-300	$200-400	$300-500
Water Usage	High (evaporative)	Medium (closed loop)	Minimal
Serviceability	Easy	Moderate	More complex
Retrofit Difficulty	N/A (baseline)	Moderate	Major overhaul
Best For	Legacy, low-density	AI/HPC clusters	Max density, new builds

Implementation Considerations

Direct-to-Chip Requirements

•CDU placement: In-row or rear-door, ~1 per 2-4 racks
•Water supply: 10-20 GPM per MW, chilled or facility water
•Manifolds: Quick-connect at rack/server level
•Leak detection: Sensors under racks, at manifolds
•Air component: Still need some CRAC/CRAH for remaining heat

Immersion Requirements

•Tank sizing: Custom tanks, typically 20-40 servers each
•Fluid cost: $1-5/liter dielectric fluid (large volumes)
•Floor loading: Much higher than traditional racks
•Fire suppression: Different requirements vs air-cooled
•Maintenance: Drip-dry procedures, specialized training

TCO Analysis: 10 MW AI Datacenter

Air Cooling

Capex$2.5M

PUE1.40

Annual Power (cooling)4.0 MW

Annual OpEx (@$0.05/kWh)$1.75M

Direct-to-Chip

Capex$3.5M

PUE1.10

Annual Power (cooling)1.0 MW

Annual OpEx$0.44M

Immersion

Capex$4.5M

PUE1.05

Annual Power (cooling)0.5 MW

Annual OpEx$0.22M

Result: D2C breaks even vs air in ~9 months; immersion in ~16 months at $0.05/kWh. Faster payback at higher power prices.

Frequently Asked Questions

Can existing datacenters be retrofitted for liquid cooling?

Direct-to-chip retrofit is feasible with CDU additions and manifold installation. Immersion requires significant floor and structural changes. Many operators are adding liquid-ready infrastructure in new builds even if deploying air initially.

What about GPU warranty with liquid cooling?

NVIDIA supports liquid cooling on DGX and HGX systems. Third-party cold plates on individual GPUs may affect warranty—check with NVIDIA and the cold plate vendor. Most enterprise deployments use validated combinations.

How does liquid cooling affect colocation pricing?

Liquid-cooled colo commands 20-40% premium per kW due to infrastructure requirements. However, higher density means less space needed—total cost may be similar or lower. Emerging liquid-ready colos are more competitive.

What is rear-door heat exchanger (RDHX)?

RDHX is a passive liquid cooling solution that replaces the rear rack door with a heat exchanger. It can handle 20-30 kW/rack—between air and full D2C. Good for moderate density without server modifications.

Calculate Cooling Water Usage

Model water consumption for different cooling approaches and locations.

Open Water Usage Calculator →

Infrastructure & Efficiency Guides

H100 SXM vs PCIe

Cooling requirements and performance delta between GPU form factors.

Water Usage Analysis

Evaluating drought impact and water scarcity risks by market.

Colocation Pricing Guide

Benchmarks for high-density, liquid-ready colocation space.

Explore More

Related Tools

PRO TOOL

Water Usage Calculator

Datacenter cooling water consumption analysis

Open Speed-to-Power Watchlist

PRO TOOL

Colocation Pricing

Metro-level colocation cost analysis with market data

Open Speed-to-Power Watchlist

PRO TOOL

BTM Power ROI

Compare Grid vs SMR vs Hydrogen power economics

Open Speed-to-Power Watchlist

Water Risk & Infrastructure Readiness

Evaluating water stress goes hand-in-hand with power and edge network readiness. Datacenters require massive cooling capacity, making municipal water morality and drought risk critical factors in site selection.

PowerWaterEdge

Open Speed-to-Power Watchlist

Track power, water, and edge changes that can reshuffle shortlist timing.

Open Speed-to-Power Watchlist

Edge Infrastructure Risk Assessment

Edge readiness is critical for latency-sensitive AI inference workloads. Our Edge Risk Index evaluates fiber density, network latency, and colocation availability across 20+ major markets to help you optimize distributed inference deployments.

PowerWaterEdge

Explore Edge Readiness Scores

View latency metrics, fiber density, and edge colocation availability across all tracked markets.

Track edge readiness in GLRI

Power Infrastructure Risk Assessment

Power availability is the primary constraint for AI datacenter deployment. Our Power Risk Index evaluates interconnection queues, curtailment exposure, and behind-the-meter strategies across 20+ major markets to help you de-risk power procurement.

PowerWaterEdge

Explore Power Risk Scores

View interconnection timelines, PPA structures, and curtailment risk across all tracked markets.

Track power risk in GLRI