The CLOUD Act, China's DSL, and Why GCC Sovereign Data Cannot Live in Public Cloud AI

By Hosn AI Services LLC · Published 2026-05-03 · 2,340 words

Three legal regimes now intersect on every prompt a GCC institution sends to a public-cloud AI service. The United States, through the CLOUD Act of 2018, claims authority over data held by US providers regardless of where the bytes sit. China, through Article 36 of the Data Security Law and Article 41 of the Personal Information Protection Law, restricts any cross-border disclosure to foreign authorities and binds Chinese providers to Beijing's permission regime. Oman, through Royal Decree 6/2022 and its enforcement phase that began on 5 February 2026, conditions cross-border transfer of personal data and protects national-security and state-economic interests by carve-out. For a sovereign workload in the Gulf, those three regimes do not add up. They cancel out the option of running sensitive material through any public-cloud AI, hyperscaler-tier or otherwise. This article walks through why, and what to do about it.

What the CLOUD Act actually requires

The Clarifying Lawful Overseas Use of Data Act, signed in March 2018, amended the Stored Communications Act of 1986 to settle a question the Microsoft Ireland litigation had pushed to the US Supreme Court: when a US-incorporated provider holds customer data on a server abroad, can a US warrant reach that data? The CLOUD Act answered yes. As the statutory provision puts it, an electronic communication service or remote computing service provider must "preserve, backup, or disclose the contents of a wire or electronic communication and any record or other information pertaining to a customer or subscriber within such provider's possession, custody, or control, regardless of whether such communication, record, or other information is located within or outside of the United States."

The four words that matter are "possession, custody, or control." They follow the corporate parent, not the data centre. AWS, Microsoft, Google, Oracle, IBM, Apple, and Meta are US persons under this definition. So is every smaller US AI provider. A warrant or qualifying subpoena issued by a US court reaches their global estate. The data does not need to leave the Gulf for US law to reach it. The provider does.

The CLOUD Act also created a second mechanism, the executive agreement, which lets approved foreign governments issue orders directly to US providers for data about non-US persons. The United Kingdom's agreement entered into force on 3 October 2022 and has reportedly carried more than twenty thousand orders to US companies. The Australia agreement entered into force on 31 January 2024. Other countries are negotiating. The geography of who can compel a US provider keeps growing.

What the CLOUD Act did not do is create a process for data-subject notification, an opt-out for foreign regulators, or any meaningful exception for state-linked customers. A provider's contractual confidentiality promise to a Gulf ministry is, in this regime, subordinated to a US legal order. The provider can challenge an order, but the default obligation is disclosure.

The China DSL mirror, plus PIPL

China's framework is the structural mirror of the CLOUD Act, written from the receiving end. The Data Security Law took effect on 1 September 2021. Article 36 reads, in substance, that domestic organisations and individuals shall not provide data stored within the mainland territory of the People's Republic of China to foreign judicial or law-enforcement institutions without the approval of the competent Chinese authority. Article 41 of the Personal Information Protection Law, in force from 1 November 2021, mirrors that prohibition for personal information specifically. Penalties for unapproved disclosure include fines of up to RMB 5 million, suspension of business operations, and licence revocation.

This means a Chinese-controlled cloud or AI provider, anywhere in the world, has a structural duty to refuse a foreign legal order unless Beijing has approved compliance. From the perspective of a GCC institution, that has the same shape as the CLOUD Act, just with a different national gatekeeper. If the customer is a sovereign institution and the provider is a Chinese person, the operator's first allegiance is to the Chinese state, not to the contractual relationship with the Gulf customer. China also requires that "important data" and personal information collected by Critical Information Infrastructure Operators be stored inside China, which means a Gulf workload routed through a Chinese-controlled service may face onward localisation obligations the customer never agreed to.

The two regimes, US and Chinese, do not collide in practice because most sovereign procurement does not run both rails simultaneously. They collide categorically. Whichever rail you choose, the operator answers to a foreign government before they answer to you.

The structural problem for hyperscaler AI

Cloud storage is one thing. AI services are something else. When a sovereign workload moves from "store our documents" to "let our staff ask an LLM about our documents," the surface area in scope of foreign warrants expands in three ways most procurement reviews still miss.

The first expansion is the prompt itself. Every question a user types is a record in the provider's possession. A senior official summarising a confidential briefing is creating a new artefact, identical in sensitivity to the briefing, that now lives on the provider's logging, abuse-monitoring, and safety-pipeline storage. The provider's terms of service usually permit retention of these prompts for security, training, or product-improvement purposes. Under the CLOUD Act, those retained prompts are reachable. Under DSL/PIPL, their cross-border movement is restricted.

The second expansion is retrieval-augmented generation. RAG works by indexing the customer's own documents into vector embeddings that the model retrieves at inference time. Those embeddings are not "just numbers." They reconstruct the underlying text with high fidelity, especially for short, sensitive passages. A provider holding a sovereign customer's RAG index is holding a derivative work of that customer's confidential corpus. The legal exposure follows.

The third expansion is fine-tuning. Adapter weights produced by training on a customer's data encode that data into the model's parameter space. Modern membership-inference attacks demonstrate that training data can be probabilistically reconstructed from a fine-tuned model. A foreign legal order reaching the provider's fine-tune storage reaches a transformed copy of the customer's training corpus. The customer's contractual deletion request does not, by itself, bind a foreign court.

Storage exposes documents. AI exposes documents, plus the questions users ask about them, plus the embeddings of those documents, plus the parameters trained from them. That is four artefacts where there used to be one.

GCC-side exposure, both directions

From a Gulf procurement seat, the legal map has two columns. The US-tier column lists AWS, Microsoft Azure, Google Cloud, Oracle, IBM, Anthropic, OpenAI, and a long tail of US AI startups. Every name in that column is in scope of the CLOUD Act, irrespective of whether they have opened a "sovereign region" in Riyadh, Abu Dhabi, Doha, or Muscat. The Chinese-tier column lists Alibaba Cloud, Huawei Cloud, Tencent Cloud, Baidu's AI services, and the Chinese-controlled accelerator ecosystem. Every name in that column is in scope of DSL Article 36 and PIPL Article 41.

A GCC institution that splits sensitive workloads across both columns has not diversified its sovereign risk. It has doubled it. The data is now reachable, in different ways and by different states, from two foreign capitals at once. The only column that does not have that problem is the in-country operator, locally incorporated, locally staffed, and not a subsidiary of a foreign group whose home statute would override the local contract.

This is why "where the data sits" has stopped being the right question. The right question is "whose statute governs the entity that touches the data." For a sovereign workload, the answer must be a single, local jurisdiction.

The Omani PDPL as the third constraint

Oman's Royal Decree 6/2022 codifies the third leg of the framework. Article 23 conditions cross-border transfer of personal data on standards set by the executive regulations and on the data subject's explicit consent unless the transfer is fully anonymised. Article 3 carves out processing performed for the protection of national security, the higher interests of the State, and its economic and financial interests. The transition period concluded on 5 February 2026, and the regulator is now in active supervisory mode, with administrative penalties available for violations.

Two consequences flow from this. First, any sovereign-tier workload routed through a foreign provider invokes the Article 23 cross-border transfer regime, which requires both consent and an importer with adequate protection. Adequate protection is hard to claim when the importer's home statute permits compelled disclosure to a foreign court. Second, the national-security and economic-interest carve-out does not authorise transfer. It heightens controls. A central-bank stress test or a defence procurement file is not exempt from scrutiny. It is more sensitive, not less.

The PDPL does not single-handedly prohibit public cloud AI for sovereign workloads. It contributes a third reason, on top of the CLOUD Act and the DSL/PIPL, why public cloud AI is the wrong tool for the job.

Why AI workloads are higher-risk than storage

Five operational properties of modern AI services raise their risk profile above traditional cloud storage.

Telemetry is continuous. A storage system writes when you upload. An inference system writes on every keystroke, every retrieval, every model call. The volume of records describing the customer's behaviour is orders of magnitude higher.

Fine-tunes are durable. A document deleted from a bucket is gone. A document that has trained an adapter persists in that adapter's parameters until the adapter is retired. The provider's "delete" button does not necessarily reach the parameters, only the source.

Embeddings are reversible enough. Vector representations of confidential text can be projected back to recognisable substance, particularly for short passages with distinctive vocabulary, the exact shape of most ministerial briefings.

Safety pipelines are sticky. Prompts that trigger abuse, jailbreak, or content-policy classifiers are routinely retained for human review even when the provider's general retention is short. A user asking a sovereign-AI question that brushes against any safety classifier puts that prompt into a longer-lived store than they expected.

Multi-tenant inference shares hardware. Side-channel research against shared GPUs is an active field. Sovereign workloads that share an accelerator with arbitrary tenants run a category of risk that single-tenant on-premise hardware does not.

None of these properties make public cloud AI useless. They make it the wrong tool for sovereign categories. The same way no minister would dictate an intelligence briefing into a third-party transcription service, no sovereign workload should be funnelled through a third-party AI service whose operator owes its first duty to a foreign capital.

On-premise as the only category-clean answer

The clean answer to a multi-jurisdictional problem is to remove the foreign jurisdictional nexus. On-premise sovereign AI does that by construction. The hardware lives inside the institution's facility. The operating system, model weights, RAG index, fine-tune adapters, and inference logs all sit on storage the institution owns. The operators are local employees or local contractors accountable to local law. There is no foreign provider in the chain who could be served with a warrant, because there is no foreign provider in the chain.

This is not a theoretical posture. Open-weight models in 2026 (Gemma 4, Qwen 3.6, DeepSeek R1, Falcon Arabic) cover almost every institutional task, and the hardware to run them runs from a single workstation up to a small rack. The economics that used to force sovereign workloads onto hyperscaler AI no longer hold. A directorate-scale deployment fits comfortably inside one facility, with concurrency, latency, and quality benchmarks that match or exceed last year's frontier closed services. Hosn ships three reference tiers under that pattern. The pattern itself, not the vendor, is the point.

At the national level, Oman has a complementary public asset in Mu'een, the shared sovereign-AI platform for cross-government use cases. Mu'een and on-premise systems answer different questions: shared workloads where the ministry is comfortable pooling, versus institution-specific workloads that should not leave the institution at all.

What sovereign cloud regions do and do not fix

Sovereign cloud regions, the localised versions of hyperscaler offerings often named for the host country, do solve some problems. They reduce latency. They satisfy data-residency clauses that are written purely about location. They simplify some procurement reviews that stop at the question of where the bytes rest. None of these are nothing.

What they do not solve is the jurisdictional question. A US-incorporated hyperscaler operating a Gulf region is still a US person. A Chinese-incorporated provider operating a Gulf region is still a Chinese person. The control plane of the region typically sits in the parent's home country. The encryption key custody, the operator identity, the software supply chain, and the corporate legal personality all point home. The European experience is instructive. The European Commission has explicitly stated, in its Cloud Sovereignty Framework and the EU Data Act, that geographic localisation is not the same as legal sovereignty. Even GAIA-X, the EU's flagship federated cloud project, has had to wrestle publicly with the participation of US-controlled members under CLOUD Act exposure.

For a GCC institution, the right hierarchy is straightforward. Public cloud, including sovereign regions, is acceptable for non-sensitive workloads where the value of latency, scale, and managed services exceeds the residual jurisdictional exposure. Sovereign cloud regions are an upgrade for workloads where data residency satisfies the requirement. On-premise sovereign AI is the answer for the workloads where jurisdictional exposure itself is the requirement: defence, intelligence, internal security, central-bank stress tests, sovereign-fund deal flow, regulator-sensitive supervisory data, and the tier of ministerial work that should never be readable from any foreign capital.

If your institution is reviewing an existing public-cloud AI deployment against this framework, or planning a new sovereign workload from a clean sheet, the next step is a one-hour briefing tailored to your concurrency, classification, and integration requirements. Email [email protected] or message +968 9889 9100. We will come to you, in Muscat or anywhere in the GCC, and walk through the architecture, the legal posture, and a credible plan against your timeline. Pricing is by quotation, sized to your specific requirement.

Frequently asked

Isn't a sovereign cloud region in the GCC enough?

Not for the workloads we are talking about. A sovereign region changes where bytes physically rest, not who legally controls the operator. A US-incorporated hyperscaler operating a Gulf region is still a US person under the CLOUD Act, which obliges it to disclose data in its possession, custody, or control regardless of location. A Chinese-incorporated provider is bound by Article 36 of the Data Security Law and Article 41 of PIPL, which restrict cross-border disclosure to foreign authorities and put the operator under the supervision of Beijing rather than the customer. Geography is necessary but not sufficient. Jurisdiction is what matters, and only an in-country operator under local law removes the foreign-warrant exposure.

What about EU adequacy decisions or the EU-US Data Privacy Framework?

Adequacy decisions address transfers under European data protection law. They do not neutralise the CLOUD Act. The EU itself recognises this tension. The EU Data Act requires cloud providers to implement safeguards against third-country access to non-personal data when that access would conflict with EU law, but it does not prohibit US-controlled providers from operating in Europe. For a GCC sovereign workload, EU adequacy is not the relevant question. The relevant question is whether your AI provider can be compelled by any foreign authority to produce or restrict access to your data, and the only clean answer is to remove the foreign legal nexus entirely.

Do the same rules apply to open-source model weights?

The weights themselves are not the issue once they are inside your perimeter. The issue is the runtime that serves them. If the weights run inside your facility on hardware you own under operators accountable to local law, the foreign-warrant exposure does not arise. If the same weights run inside a hyperscaler's managed AI service, all the prompts, context, embeddings, and inference logs become data in that hyperscaler's possession, custody, or control. Open weights enable sovereignty. They do not deliver it on their own.

Is GCC data already in scope of US warrants today?

Yes, when it is held by a US-controlled provider. The Microsoft Ireland case established that the question existed. The CLOUD Act resolved it in favour of disclosure regardless of where the bytes sit. A GCC bank, ministry, or sovereign fund using a US hyperscaler is, under US law, holding its data with a US person who can be compelled to produce it. The data does not need to leave the Gulf for the US to reach it. That is the structural point most procurement teams in 2026 still under-weight.

What about on-premise appliances with cloud-managed control planes?

An appliance whose control plane lives in a foreign cloud is not a sovereign appliance. The vendor can update, suspend, audit, or, under sufficient legal pressure, disable the device remotely. For sovereign workloads, the test is the air-gap test. Cut the link to the vendor's cloud. If the system stops working, the system is not sovereign. A genuine sovereign appliance keeps its inference, identity, audit, and update channels inside the institution's perimeter, with explicit, signed, offline update bundles rather than a continuous management tunnel.

How do I de-risk an existing public-cloud AI deployment?

Three steps, in order. First, classify what is already flowing. List every prompt source, document store, and integration that touches the cloud AI service today, and tag each by sensitivity against your institution's classification policy. Second, contain the highest tier. The handful of workloads that touch state-economic, defence, intelligence, or regulator-sensitive material should be moved to an on-premise sovereign system on a defined timeline, typically eight to sixteen weeks. Third, accept the public service for the residual. Marketing copy, internal IT helpdesk content, and other non-sensitive material can stay in the cloud. The goal is category cleanliness, not blanket prohibition.

What the CLOUD Act actually requires

The China DSL mirror, plus PIPL

The structural problem for hyperscaler AI

GCC-side exposure, both directions

The Omani PDPL as the third constraint

Why AI workloads are higher-risk than storage

On-premise as the only category-clean answer

What sovereign cloud regions do and do not fix

Frequently asked

Related

AI Sovereignty Under the Omani Personal Data Protection Law

On-Premise AI for Sovereign Institutions, Oman and the GCC

Hyperscaler AI Data Residency, the Gap Between Region and Jurisdiction