Air-Gap Network Architecture for Sovereign AI Clusters
An air-gapped AI cluster is not "the same cluster, but with the WAN cable unplugged". It is a different architecture from the first port up. Routing tables, identity, DNS, time, telemetry, and the model-update workflow all change once the cluster has no path to the internet. This article maps the network architecture sovereign defence and intelligence buyers should expect from a serious vendor, and pairs with the broader AI rack power cooling airgap guide that covers physical-plant requirements alongside the network.
The classified-network reality
A sovereign AI cluster on the classified side of an institution begins with one absolute: no IP route to any public network, ever. Not via NAT, not via a "managed proxy", not via a hypervisor uplink kept "for emergencies". The accelerators, controllers, retrieval index, storage, and inference frontends sit on a routed domain that terminates at a physical boundary. There is no DNS resolver that can leak a query, no NTP server that can be poisoned across the perimeter, and no certificate authority shared with the user-facing internet.
This requirement is codified for U.S. controlled environments by NIST SP 800-171 Revision 3, which permits scope reduction by isolating sensitive components in a dedicated security domain via "physical separation, logical separation, or a combination of both". Facility-side rules for sensitive compartmented information add the construction and TEMPEST envelope under ICD 705. Oman's defence and intelligence buyers translate the same principles through the National Cybersecurity Centre's classified-systems guidance. The labels differ, the architecture does not.
Practical consequences ripple through the build. The cluster runs its own internal CA. Time comes from a stratum-1 GPS-disciplined source physically inside the facility. Logs aggregate to an on-premise SIEM that no remote analyst can reach. Identity is local: there is no "sign in with the corporate IdP" because the corporate IdP lives on the unclassified user LAN.
Cross-domain solutions: diodes, guards, and removable media
Air-gapped does not mean "data never moves". Models, threat intelligence, and curated reference corpora must reach the cluster, and approved outputs sometimes need to leave it. The architecture for those crossings is the cross-domain solution layer, governed by NSA's Raise the Bar design strategy and the National Cross Domain Strategy Management Office's accreditation regime.
- Data diodes (one-way transfer). Hardware-enforced unidirectional gateways, sometimes called optical or light-pipe diodes. Bits physically can only move one way. Used for ingress of signed model bundles, threat intelligence, and read-only feeds, or for egress of approved logs to an unclassified SOC. There is no return path, so malware cannot exfiltrate over the same wire.
- Guard servers. Bidirectional but heavily mediated: a hardened appliance that inspects every byte against a strict schema, strips metadata, validates digital signatures, and rate-limits in both directions. Used when a workflow genuinely needs round-trip flow (a sanitised query out, a vetted answer in). Modern accredited guards on the Raise the Bar list include products from Forcepoint, BAE Systems XTS Guard, and Owl Cyber Defense.
- Manual removable media. The slowest channel, but in some sovereign environments the only acceptable one for highly classified content. Encrypted media in tamper-evident packaging, signed manifests, dual-control insertion, and an immutable chain-of-custody log. The latency is hours, the assurance is highest.
Most sovereign AI clusters use all three at different sensitivities. A daily threat-intelligence pull comes through a diode, an operator's federated training updates ride a guard server, and a quarterly model refresh moves on hand-carried media.
Update workflow: signed bundles and dual control
The model-update path is where most air-gap architectures actually fail. A vendor cannot simply expose a "phone home" URL and ask the institution to whitelist it. The correct pattern is the signed bundle workflow.
- The vendor (or an internal release authority) packages the model weights, tokenizer, runtime container, evaluation report, and a SHA-256 manifest into a single bundle. The bundle is signed by an offline release key that lives in a hardware security module.
- The bundle is delivered to the institution by encrypted removable media or pushed across the diode into a staging zone on the classified side.
- The receiving security team verifies the signature against the release-key public certificate already inside the perimeter, runs malware scans, and inspects the changelog.
- Two named officers from different chains of command issue dual-control approval before the bundle is promoted from staging to the production inference fleet.
- Promotion is gated by a canary phase: a single inference node serves the new bundle behind a load-balancer flag for a defined window, with rollback on alert.
This is slower than a public-cloud auto-update. That is the point. The institution trades hours of latency for an auditable, refusable promotion path that no compromised vendor account can bypass.
Internal network design
Inside the perimeter, network bandwidth is what determines whether the appliance hits its latency targets. Tensor parallel inference moves activations between GPUs at line rate, and a 70B model split across two H200 cards needs about 100Gbps between accelerators to avoid stalling. Multi-node retrieval and embedding traffic between the inference frontend and the vector index typically needs 25Gbps. Out-of-band management runs on a separate 1Gbps or 10Gbps fabric, never sharing physical paths with data. See 100GbE versus 25GbE in an AI cluster for the per-link sizing math.
The cluster is also segmented from the user LAN that staff actually authenticate against. A typical layout puts inference accelerators, retrieval, and storage on a "compute fabric" VRF; the user-facing web frontend on a "presentation" VRF; and the cross-domain ingress on a third. Inter-VRF flows pass through inspection devices with explicit allow-lists. The user LAN itself, which connects to email and the corporate IdP, sits on the unclassified side of the boundary entirely and reaches the inference frontend only through whatever guard or terminal-services pattern the institution has accredited.
Monitoring and incident response without internet
An air-gapped cluster cannot stream telemetry to a hyperscale SIEM. Every observability primitive lives inside the perimeter. Logs aggregate to an on-premise stack (Elastic, OpenSearch, or Splunk Enterprise) that runs on a dedicated security domain. Metrics scrape into a local Prometheus and Grafana. Alerts route to dashboards visible from designated incident-response workstations, not to internet pagers. Signature updates for endpoint security and intrusion detection arrive through the same diode-and-bundle workflow as model weights, on a documented cadence.
Out-of-band escalation, when an analyst genuinely needs to talk to the outside world, runs on an explicit human channel: a separate unclassified workstation in the SOC, a phone call, a courier. There is no automatic webhook to a SaaS pager. Incident-response runbooks bake this in: every step that would, on a normal cluster, generate an outbound network call is rewritten as a deliberate human action with logged authorisation.
The disciplined air-gap is slower, more boring, and dramatically harder to compromise than any internet-attached architecture. It is also a prerequisite for sovereign AI workloads at the level of defence and internal-security ministries. To walk through the network architecture, cross-domain options, and update workflow for a specific sovereign deployment, email [email protected] or message +968 9889 9100. Pricing is by quotation, sized to the threat model and concurrency.
Frequently asked
What does air-gap actually mean for an AI cluster?
Physical separation. The cluster has no IP route to any public network, no transit to the institution's internet-facing user LAN, and no shared DNS, identity, or update infrastructure with the outside world. Updates and data crossings happen through deliberate, audited paths: a one-way data diode for ingress, a guard server for tightly schema-validated bidirectional flows, or removable media under dual-control approval. The internal cluster fabric is its own routed domain, segmented from the user LAN that staff actually log in from.
How do model updates reach an air-gapped AI cluster?
Through a signed bundle workflow. The model weights, tokenizer, runtime container, and changelog are packaged on the high side of the perimeter, signed by the build authority, dropped onto encrypted removable media or pushed across a one-way data diode into a staging area, scanned by the receiving security team, dual-control approved, and only then promoted into the inference fleet. The chain of custody is logged at every step. There is no scheduled background pull from a vendor URL because there is no route to one.
Why does the internal cluster need 100GbE if it cannot reach the internet?
Tensor parallel inference and multi-node training move tens of gigabytes of activations and gradients between accelerators every second. A 70B model split across two GPUs in tensor parallel can saturate 100Gbps between cards, and 25Gbps between nodes for retrieval and embedding traffic. Internet bandwidth is irrelevant to the cluster. Internal east-west bandwidth between accelerators, storage, and the retrieval index is what determines whether the appliance hits its latency targets.
How do you do incident response on an air-gapped cluster with no SOC feed?
Telemetry stays inside. The cluster runs its own log aggregator, SIEM index, and alerting console reachable only from a designated incident-response workstation on the same classified domain. Alerts surface to security staff via dashboards, not via internet-routed pagers. Threat intelligence and signature updates arrive via the same signed bundle workflow that delivers models, on a defined cadence (weekly, daily, or per-incident). Out-of-band breach notification, if the institution requires it, goes through an explicit human courier channel, never through an automatic webhook.