Liquid vs Air Cooling for H100 and H200 Racks
The honest answer for sovereign buyers planning H100 or H200 capacity in Muscat: if your design exceeds four GPUs per chassis, air cooling is not the conservative choice, it is the risky one. NVIDIA's SXM5 H100 carries a 700-watt thermal envelope per GPU and the H200 follows the same package. Eight of those in one box, with host CPUs and NVSwitch, push 5.6 to 8 kilowatts of waste heat into a single rack. That number breaks the assumptions every legacy ministry server room was built around. This guide walks through the cooling decision for an on-premise sovereign deployment, where racks live inside the fortress, not in a hyperscaler's hall.
Why H100 and H200 push the air-cooling envelope
NVIDIA's published thermal guidance for the SXM5 H100 is unambiguous: cold-aisle inlet at or below 25 degrees Celsius, at least 300 cubic feet per minute of airflow per GPU, and validated liquid cooling for any 8-GPU configuration sustaining training-class workloads. NVIDIA's H100 datasheet lists the SXM5 at 700 W TDP, four times the legacy V100's 300 W and roughly double a typical air-cooled enterprise GPU.
- 4 GPUs per chassis: 2.8 kW GPU heat plus host overhead, total 4 to 5 kW. Air-cooling viable with a strict cold aisle and contained hot aisle.
- 8 GPUs per chassis (DGX-class): 5.6 kW GPUs alone, 7 to 8 kW with host. Sustained training without throttling requires liquid.
- 2 chassis per rack at 8 GPUs: 14 to 16 kW. No serious data centre tries to remove this with raised-floor air alone.
Industry coverage of the wider high-density cooling shift notes that once you cross 400 W per GPU, liquid cooling becomes a hard engineering requirement, not a preference. H100 and H200 sit well past that line.
Direct-to-chip vs immersion vs rear-door HX
Three liquid options are mature enough to ship today. Pick on rack count, retrofit constraints, and operations capability, not on whichever vendor pitches loudest.
Direct liquid cooling (DLC), cold plates on chip
Cold plates with micro-channels mount directly on each GPU and CPU die. A propylene-glycol/water mix circulates through manifolds to a Coolant Distribution Unit (CDU) at the rack or row. CDUs reject heat to facility chilled water or a dry cooler. DLC captures 70 to 80 percent of server heat at the chip and pulls Power Usage Effectiveness (PUE) down to roughly 1.10 to 1.20, against 1.55 to 1.67 for traditional air. It is the standard NVIDIA reference for 8-GPU H100 and H200 systems.
Immersion cooling, single-phase or two-phase
Whole servers submerge in dielectric fluid. Single-phase circulates the fluid through a heat exchanger; two-phase boils a low-temperature engineered fluid that condenses on a cold plate above the tank. Immersion captures 100 percent of board heat and removes every fan in the chassis. The tradeoff is operational: tanks cost 50,000 to 100,000 US dollars per unit, fluid is consumable, and pulling a board for service is messy. Sovereign buyers operating one or two racks rarely justify it.
Rear-door heat exchangers (RDHx)
An RDHx is a passive or active liquid-to-air radiator on the back of the rack. Server fans push hot exhaust through the coil, chilled water inside the door absorbs it, and near-neutral temperature air returns to the room. RDHx is the least intrusive retrofit option and the cheapest entry to liquid: roughly 3,000 to 5,000 US dollars per rack hardware. It will not, on its own, sustain an 8-GPU H200 chassis under load, but paired with DLC on the GPU sled it captures the residual heat (host CPU, DIMMs, power supplies) so the room itself never sees the load.
Retrofit vs greenfield decision
Most Omani sovereign buyers are not building a hyperscale shell. They are converting a Tier-III ministry data hall, or a defence-grade vault, into AI capacity. That changes the calculus.
- One or two racks, existing hall: RDHx alone for 4-GPU configurations. RDHx plus DLC manifolds for any 8-GPU H100 or H200. Skip immersion.
- Three to ten racks, dedicated room: Full DLC with a row-level CDU, RDHx as a polishing stage. Plan for chilled-water plant capacity of roughly 1.3 times peak IT load to absorb summer Muscat ambient.
- Greenfield campus, more than ten racks: DLC primary, immersion as an option for specific high-density tiles, dry coolers and adiabatic assist sized to Oman climate (45 degrees Celsius peak ambient is the design point, not a hypothetical).
For deeper rack-power integration, see our AI rack power cooling airgap pillar, the canonical reference for how Hosn ties cooling, UPS, and the classified-network boundary into one buildable specification.
Operational considerations: leaks, fluid, and access
The hardware is the easy part. Operations decide whether a liquid system stays alive past year two.
- Leak detection: Cable-style leak sensors under each manifold and CDU, hard-wired to the building management system, with automatic shutoff valves. Treat the BMS interface like the air-gap interface: documented, tested, and signed off by the security officer.
- Dielectric fluid (immersion): Approved fluids include 3M Novec lineage replacements (post-PFAS phase-out) and engineered hydrocarbon blends. Audit the supplier's compliance posture, not just the spec sheet, before committing a sovereign deployment to a long-tail vendor.
- Maintenance access: DLC adds two failure modes: pump redundancy at the CDU and quick-disconnect couplings at every cold plate. Specify N+1 pumps, dripless couplings, and a documented hot-swap drill. Train two operators per shift, not one.
- Coolant chemistry: Annual analysis for biological growth, glycol degradation, and corrosion inhibitor depletion. Add this line to the service contract before you sign, not after the first algae bloom.
How Hosn handles cooling for Omani sovereign racks
Hosn's reference rack ships with DLC cold plates pre-fitted on every GPU sled, an in-rack CDU sized for the configured GPU count, and an RDHx as a finishing stage. Leak sensors, BMS integration, and a one-year on-site service contract are part of the delivered package, not a line item the buyer has to remember to ask for. Pricing is by quotation, scoped to the specific room, climate envelope, and classification tier. Email [email protected] for a one-hour briefing, we will walk through your room's cooling envelope and produce a concrete buildable spec.
Frequently asked
Can a 4-GPU H100 rack still run on air cooling?
A 4x H100 SXM5 chassis sits at roughly 2.8 kW of GPU heat plus host overhead, total around 4 to 5 kW. With cold-aisle inlet at or below 25 degrees Celsius and at least 300 CFM per GPU, air cooling holds. Above that GPU count, or in any 8x H100 or H200 box, sustained training without thermal throttling effectively requires direct liquid cooling or a rear-door heat exchanger.
Is immersion cooling worth it for a sovereign deployment of one or two racks?
Rarely. Single-phase immersion captures 100 percent of board heat and removes fans entirely, but tank cost runs 50,000 to 100,000 dollars per chassis, dielectric fluid is consumable, and serviceability is slow. For one or two racks, direct-to-chip cold plates plus a rear-door heat exchanger deliver most of the efficiency at a fraction of the operational complexity. Immersion earns its place at greenfield campus scale, not in a converted ministry server room.
What does retrofit liquid cooling actually cost in Oman?
Budget envelope, by quotation: a rear-door heat exchanger is roughly 3,000 to 5,000 dollars per rack hardware plus chilled-water tap-in. Full direct liquid cooling with a coolant distribution unit lands at 15,000 to 30,000 dollars per rack. Add Oman-specific items: humidity control, dust filtration, leak detection wired to the building management system, and a service contract with a Gulf-region integrator. Hosn quotes the full delivered package, not just the cold plates.
Where does this article fit in the broader Hosn rack guide?
Cooling is one leg of a three-legged stool: power, cooling, and the air-gap boundary. Read the AI rack power cooling airgap pillar for the full integrated view, including UPS sizing, generator coupling, and how the cooling plant is isolated from the classified network.