Liquid Cooling for AI Datacenters
Why is liquid cooling essential for AI datacenters?
H100 GPUs draw 700W+ per chip, making air cooling insufficient for high-density AI clusters. Liquid cooling enables: 80-100 kW/rack densities (vs 10-15 kW air-cooled), PUE of 1.05-1.15 (vs 1.3-1.5 air), and 30-40% reduction in cooling energy. Direct-to-chip is the mainstream solution; immersion offers benefits for extreme density but requires infrastructure changes.
Key Data Points
- Rack Density: 80-100 kW (Liquid) vs 10-15 kW (Air)
- PUE Efficiency: 1.05-1.15 vs 1.3-1.5
- Energy Reduction: 30-40% cooling energy savings
- Mainstream Tech: Direct-to-Chip (D2C) Cold Plates
- Emerging Tech: Two-phase Immersion Cooling
Why AI Datacenters Need Liquid Cooling
Heat Density
An 8-GPU DGX H100 produces 10.2 kW of heat. A single 42U rack can hold 4+ systems = 40+ kW/rack. Air cooling is limited to ~15 kW/rack.
Energy Efficiency
Liquid is 3,500x more efficient at heat transfer than air. This translates to PUE of 1.05-1.15 vs 1.3-1.5 for air, reducing total power consumption by 15-25%.
Density Economics
Higher density = less floor space, shorter cable runs, reduced facility costs. A liquid-cooled cluster may need 50% less space than air-cooled equivalent.
Liquid Cooling Technologies
Direct-to-Chip (D2C)
Cold plates attached directly to GPU/CPU, circulating liquid through rack-level CDUs.
Vendors: CoolIT, Asetek, Vertiv, Zutacore
Immersion Cooling
Entire servers submerged in dielectric fluid, removing 100% of heat.
Vendors: GRC, LiquidCool, Submer, Iceotope
Cooling Technology Comparison
| Factor | Air Cooling | Direct-to-Chip | Immersion |
|---|---|---|---|
| Max Rack Density | 10-15 kW | 60-100 kW | 100-250+ kW |
| PUE | 1.3-1.5 | 1.05-1.15 | 1.02-1.08 |
| Capex ($/kW) | $150-300 | $200-400 | $300-500 |
| Water Usage | High (evaporative) | Medium (closed loop) | Minimal |
| Serviceability | Easy | Moderate | More complex |
| Retrofit Difficulty | N/A (baseline) | Moderate | Major overhaul |
| Best For | Legacy, low-density | AI/HPC clusters | Max density, new builds |
Implementation Considerations
Direct-to-Chip Requirements
- •CDU placement: In-row or rear-door, ~1 per 2-4 racks
- •Water supply: 10-20 GPM per MW, chilled or facility water
- •Manifolds: Quick-connect at rack/server level
- •Leak detection: Sensors under racks, at manifolds
- •Air component: Still need some CRAC/CRAH for remaining heat
Immersion Requirements
- •Tank sizing: Custom tanks, typically 20-40 servers each
- •Fluid cost: $1-5/liter dielectric fluid (large volumes)
- •Floor loading: Much higher than traditional racks
- •Fire suppression: Different requirements vs air-cooled
- •Maintenance: Drip-dry procedures, specialized training
TCO Analysis: 10 MW AI Datacenter
Air Cooling
Direct-to-Chip
Immersion
Result: D2C breaks even vs air in ~9 months; immersion in ~16 months at $0.05/kWh. Faster payback at higher power prices.
Frequently Asked Questions
Can existing datacenters be retrofitted for liquid cooling?
Direct-to-chip retrofit is feasible with CDU additions and manifold installation. Immersion requires significant floor and structural changes. Many operators are adding liquid-ready infrastructure in new builds even if deploying air initially.
What about GPU warranty with liquid cooling?
NVIDIA supports liquid cooling on DGX and HGX systems. Third-party cold plates on individual GPUs may affect warranty—check with NVIDIA and the cold plate vendor. Most enterprise deployments use validated combinations.
How does liquid cooling affect colocation pricing?
Liquid-cooled colo commands 20-40% premium per kW due to infrastructure requirements. However, higher density means less space needed—total cost may be similar or lower. Emerging liquid-ready colos are more competitive.
What is rear-door heat exchanger (RDHX)?
RDHX is a passive liquid cooling solution that replaces the rear rack door with a heat exchanger. It can handle 20-30 kW/rack—between air and full D2C. Good for moderate density without server modifications.
Calculate Cooling Water Usage
Model water consumption for different cooling approaches and locations.
Open Water Usage Calculator →Infrastructure & Efficiency Guides
Explore More
Related Tools
Water Usage Calculator
Datacenter cooling water consumption analysis
Open Speed-to-Power WatchlistColocation Pricing
Metro-level colocation cost analysis with market data
Open Speed-to-Power WatchlistWater Risk & Infrastructure Readiness
Evaluating water stress goes hand-in-hand with power and edge network readiness. Datacenters require massive cooling capacity, making municipal water morality and drought risk critical factors in site selection.
Open Speed-to-Power Watchlist
Track power, water, and edge changes that can reshuffle shortlist timing.
Edge Infrastructure Risk Assessment
Edge readiness is critical for latency-sensitive AI inference workloads. Our Edge Risk Index evaluates fiber density, network latency, and colocation availability across 20+ major markets to help you optimize distributed inference deployments.
Explore Edge Readiness Scores
View latency metrics, fiber density, and edge colocation availability across all tracked markets.
Power Infrastructure Risk Assessment
Power availability is the primary constraint for AI datacenter deployment. Our Power Risk Index evaluates interconnection queues, curtailment exposure, and behind-the-meter strategies across 20+ major markets to help you de-risk power procurement.
Explore Power Risk Scores
View interconnection timelines, PPA structures, and curtailment risk across all tracked markets.