UPS Sizing for an AI Rack
A correctly sized UPS is the cheapest insurance policy on a sovereign AI rack, and the easiest place to over-spend or under-spec. GPUs are not ordinary servers: they spike well above their published TDP during gradient steps, and they hate the 4 to 8 millisecond transfer gap a line-interactive UPS imposes. This piece walks through the AI-rack power profile, the three UPS classes worth considering, the sizing math for kVA and runtime, and the orchestration discipline that turns a UPS event into a graceful checkpoint instead of a corrupted run.
The AI-rack power profile
Before sizing anything, you need an honest picture of what an AI rack actually draws. Sustained load is the easy number; transient behaviour is what catches teams who only read datasheets.
- Single-node entry rack (4x H100 SXM5): roughly 4.2 kW at the PDU, of which the GPUs alone account for 2.8 kW (700 W each per the NVIDIA H100 datasheet). Add host CPUs, NVMe, networking, and PSU losses to get the headline figure.
- Mid-density training rack (8x H200 in one HGX node): 10 to 12 kW sustained, with brief 13 to 14 kW transients during bursty all-reduce phases on large batches.
- Mixed inference rack (RTX 6000 Ada plus a Mac Studio cluster): 3 to 5 kW sustained, very flat, but punctuated by short spikes when long-context Gemma or Qwen jobs hit the GPUs.
The transient behaviour matters because GPUs are aggressively clocked. NVIDIA's published TDP is a sustained envelope, not an instantaneous ceiling. Uptime Institute field telemetry consistently shows AI racks pulling 1.2 to 1.4x their nameplate sustained draw for sub-second windows during training, particularly when memory traffic synchronises across NVLink. A UPS sized only for steady-state will trip on those peaks.
UPS classes: online double-conversion vs line-interactive vs DC-direct
Three families of UPS topology realistically cover the AI rack market. Two of them are wrong for GPU servers, but they are tempting on capex grounds, so it is worth being explicit about the trade-offs.
- Online double-conversion. Mains is rectified to DC, then inverted back to clean AC. The load is permanently on the inverter, so there is no transfer event during a grid fault. Output is regulated to within 1 to 2 percent voltage and 0.1 percent frequency, and the harmonic profile is clean enough for any GPU PSU. This is the only correct answer for AI training racks. Capex premium of 30 to 40 percent over line-interactive is real and worth paying.
- Line-interactive. Mains passes through directly during normal operation, with an autotransformer trimming voltage. On a fault the unit switches to inverter, but that switch takes 4 to 8 milliseconds. Most enterprise servers tolerate this; AI training jobs do not. They see the gap as a brownout, the GPU PSUs hiccup, and the in-flight optimiser state is at risk. Acceptable for office gear, networking, and lights-out monitoring. Wrong for GPU compute.
- DC-direct (380 V or 48 V bus). The hyperscaler approach: skip AC inversion entirely and feed the rack from a DC bus backed by a battery string. Best efficiency (PUE benefit of 5 to 8 percent), no inverter to fail, and instantaneous battery handover. The catch is that almost no off-the-shelf GPU server accepts DC input today, so this only fits when you are co-designing the rack. For most sovereign deployments in 2026, DC-direct is a future option, not a present one.
For the standard Hosn Tower and Hosn Rack form factors we default to online double-conversion lithium-iron-phosphate UPS modules from the Schneider Electric, Vertiv, or Eaton catalogues, sized one tier above sustained load to absorb transients.
Sizing math: kVA per rack, headroom, and N+1
The arithmetic for a single rack is simple, but the discipline is in applying it consistently:
- Step 1, real PDU draw. Sum sustained load from every device in the rack, then add 15 to 20 percent for transient peaks. A 4x H100 node sustained at 4.2 kW becomes a 4.8 to 5.0 kW design number.
- Step 2, convert to kVA. Divide by the UPS output power factor, typically 0.9 for modern lithium units. 5 kW at 0.9 PF needs 5.6 kVA, so the next standard size up is a 6 kVA frame.
- Step 3, add headroom for growth. 25 to 30 percent on top of the design number. The 6 kVA single-node example bumps to a 7.5 to 8 kVA UPS. This headroom also keeps the inverter operating in its efficient sweet spot rather than at the redline.
- Step 4, redundancy decision. N+1 means one extra module in a paralleled frame. A regulator-, defence-, or banking-tier workload gets N+1. A development rack does not. Capex bump is 25 to 35 percent of the base UPS cost; downtime risk reduction is roughly an order of magnitude.
For a 12 kW dense training rack the same arithmetic gives 14 kW design, 15.5 kVA at 0.9 PF, with headroom landing on a 20 kVA frame, plus an N+1 module on top for a sovereign tier. That is the correct answer; anything smaller is a false economy.
Runtime targets for graceful checkpoint and shutdown
Battery runtime is where most teams over-spend. The right framing is not "how long can we ride out an outage", it is "how long does the rack need to land safely while the generator starts." The two phases are:
- Phase 1, generator handover (0 to 60 seconds). The UPS holds the rack while the diesel generator starts, synchronises, and assumes load. This is non-negotiable, and it is the dominant runtime case in practice on the Omani grid.
- Phase 2, graceful shutdown (60 seconds to 10 minutes). If the generator fails to start, the orchestration layer must flush optimiser state, checkpoint model weights, drain the inference queue, and bring GPUs to a safe halt. This is where 5 to 10 minutes of additional runtime earns its keep, not 60 minutes.
Total target: 10 to 15 minutes at full load. Pair the UPS with a UPS-aware controller (NUT, PowerChute, or a custom systemd hook) that triggers checkpoint scripts on the inference and training services as soon as the rack goes on battery. Without that orchestration the runtime is wasted; with it, even a worst-case generator failure ends in clean restart instead of corrupted weights.
UPS sizing slots into the broader sovereign AI rack power cooling airgap playbook every Hosn buyer should have ready before signing a hardware order. Email [email protected] for a one-hour briefing where we run the numbers against your actual rack BOM and grid conditions.
Frequently asked
What kVA UPS does a single AI rack actually need?
Take the worst-case PDU draw, divide by 0.9 power factor, then add 25 to 30 percent headroom. A 4x H100 node at about 4.2 kW lands on 6 to 8 kVA. A dense 8x H200 node at 12 kW lands on 18 to 20 kVA. Always size against transient peaks, not the published TDP, because GPUs spike well above sustained load during gradient steps.
Online double-conversion or line-interactive for GPU servers?
Online double-conversion only. Line-interactive units pass mains through during normal operation and only switch to inverter on a fault, which means the load sees raw grid noise, harmonics, and a 4 to 8 millisecond transfer gap. AI training rigs see that as a brownout and corrupt in-flight checkpoints. Spend the extra capex on a true double-conversion topology.
How long should the UPS run before the generator takes over?
Target 10 to 15 minutes at full load. That is enough time for a maintained diesel generator to start, stabilise, and assume the rack. Inside that window the orchestration layer flushes optimizer state, checkpoints model weights, and brings the GPUs to a safe halt. Chasing 30+ minute runtimes is the wrong investment, fund a tested generator with 24 hours of fuel instead.
When is N+1 UPS redundancy actually justified?
Any rack that supports a regulator, defence-tier, or banking-tier workload. The cost of a redundant UPS module is small compared to a corrupted training run or a downed inference service in a sovereign mission. For development and non-production racks, a single double-conversion unit with a maintenance bypass is acceptable. The decision is policy, not hardware.