INFRASTRUCTURE

Liquid Cooling for AI Datacenters

Summary12 Data Sources

Why is liquid cooling essential for AI datacenters?

H100 GPUs draw 700W+ per chip, making air cooling insufficient for high-density AI clusters. Liquid cooling enables: 80-100 kW/rack densities (vs 10-15 kW air-cooled), PUE of 1.05-1.15 (vs 1.3-1.5 air), and 30-40% reduction in cooling energy. Direct-to-chip is the mainstream solution; immersion offers benefits for extreme density but requires infrastructure changes.

Key Data Points

  • Rack Density: 80-100 kW (Liquid) vs 10-15 kW (Air)
  • PUE Efficiency: 1.05-1.15 vs 1.3-1.5
  • Energy Reduction: 30-40% cooling energy savings
  • Mainstream Tech: Direct-to-Chip (D2C) Cold Plates
  • Emerging Tech: Two-phase Immersion Cooling

Why AI Datacenters Need Liquid Cooling

Heat Density

An 8-GPU DGX H100 produces 10.2 kW of heat. A single 42U rack can hold 4+ systems = 40+ kW/rack. Air cooling is limited to ~15 kW/rack.

Energy Efficiency

Liquid is 3,500x more efficient at heat transfer than air. This translates to PUE of 1.05-1.15 vs 1.3-1.5 for air, reducing total power consumption by 15-25%.

Density Economics

Higher density = less floor space, shorter cable runs, reduced facility costs. A liquid-cooled cluster may need 50% less space than air-cooled equivalent.

Liquid Cooling Technologies

Direct-to-Chip (D2C)

Cold plates attached directly to GPU/CPU, circulating liquid through rack-level CDUs.

Heat Removal60-80% of total
Rack Density60-100 kW/rack
PUE Impact1.05-1.15
Capex Premium15-25% over air
MaturityProduction-ready

Vendors: CoolIT, Asetek, Vertiv, Zutacore

Immersion Cooling

Entire servers submerged in dielectric fluid, removing 100% of heat.

Heat Removal100% of total
Rack Density100-250+ kW/tank
PUE Impact1.02-1.08
Capex Premium40-60% over air
MaturityEmerging

Vendors: GRC, LiquidCool, Submer, Iceotope

Cooling Technology Comparison

FactorAir CoolingDirect-to-ChipImmersion
Max Rack Density10-15 kW60-100 kW100-250+ kW
PUE1.3-1.51.05-1.151.02-1.08
Capex ($/kW)$150-300$200-400$300-500
Water UsageHigh (evaporative)Medium (closed loop)Minimal
ServiceabilityEasyModerateMore complex
Retrofit DifficultyN/A (baseline)ModerateMajor overhaul
Best ForLegacy, low-densityAI/HPC clustersMax density, new builds

Implementation Considerations

Direct-to-Chip Requirements

  • CDU placement: In-row or rear-door, ~1 per 2-4 racks
  • Water supply: 10-20 GPM per MW, chilled or facility water
  • Manifolds: Quick-connect at rack/server level
  • Leak detection: Sensors under racks, at manifolds
  • Air component: Still need some CRAC/CRAH for remaining heat

Immersion Requirements

  • Tank sizing: Custom tanks, typically 20-40 servers each
  • Fluid cost: $1-5/liter dielectric fluid (large volumes)
  • Floor loading: Much higher than traditional racks
  • Fire suppression: Different requirements vs air-cooled
  • Maintenance: Drip-dry procedures, specialized training

TCO Analysis: 10 MW AI Datacenter

Air Cooling

Capex$2.5M
PUE1.40
Annual Power (cooling)4.0 MW
Annual OpEx (@$0.05/kWh)$1.75M

Direct-to-Chip

Capex$3.5M
PUE1.10
Annual Power (cooling)1.0 MW
Annual OpEx$0.44M

Immersion

Capex$4.5M
PUE1.05
Annual Power (cooling)0.5 MW
Annual OpEx$0.22M

Result: D2C breaks even vs air in ~9 months; immersion in ~16 months at $0.05/kWh. Faster payback at higher power prices.

Frequently Asked Questions

Can existing datacenters be retrofitted for liquid cooling?

Direct-to-chip retrofit is feasible with CDU additions and manifold installation. Immersion requires significant floor and structural changes. Many operators are adding liquid-ready infrastructure in new builds even if deploying air initially.

What about GPU warranty with liquid cooling?

NVIDIA supports liquid cooling on DGX and HGX systems. Third-party cold plates on individual GPUs may affect warranty—check with NVIDIA and the cold plate vendor. Most enterprise deployments use validated combinations.

How does liquid cooling affect colocation pricing?

Liquid-cooled colo commands 20-40% premium per kW due to infrastructure requirements. However, higher density means less space needed—total cost may be similar or lower. Emerging liquid-ready colos are more competitive.

What is rear-door heat exchanger (RDHX)?

RDHX is a passive liquid cooling solution that replaces the rear rack door with a heat exchanger. It can handle 20-30 kW/rack—between air and full D2C. Good for moderate density without server modifications.

Calculate Cooling Water Usage

Model water consumption for different cooling approaches and locations.

Open Water Usage Calculator →

Infrastructure & Efficiency Guides

Explore More

Related Tools

PRO TOOL

Water Usage Calculator

Datacenter cooling water consumption analysis

Open Speed-to-Power Watchlist
PRO TOOL

Colocation Pricing

Metro-level colocation cost analysis with market data

Open Speed-to-Power Watchlist
PRO TOOL

BTM Power ROI

Compare Grid vs SMR vs Hydrogen power economics

Open Speed-to-Power Watchlist

Water Risk & Infrastructure Readiness

Evaluating water stress goes hand-in-hand with power and edge network readiness. Datacenters require massive cooling capacity, making municipal water morality and drought risk critical factors in site selection.

PowerWaterEdge

Open Speed-to-Power Watchlist

Track power, water, and edge changes that can reshuffle shortlist timing.

Edge Infrastructure Risk Assessment

Edge readiness is critical for latency-sensitive AI inference workloads. Our Edge Risk Index evaluates fiber density, network latency, and colocation availability across 20+ major markets to help you optimize distributed inference deployments.

PowerWaterEdge

Explore Edge Readiness Scores

View latency metrics, fiber density, and edge colocation availability across all tracked markets.

Power Infrastructure Risk Assessment

Power availability is the primary constraint for AI datacenter deployment. Our Power Risk Index evaluates interconnection queues, curtailment exposure, and behind-the-meter strategies across 20+ major markets to help you de-risk power procurement.

PowerWaterEdge

Explore Power Risk Scores

View interconnection timelines, PPA structures, and curtailment risk across all tracked markets.

Open Readiness Map