On-Premise AI for Omani Law Firms: Legal Research Inside the Office
The top tier of the Omani Bar, the Said Al Shahry, Al Busaidy Mansoor Jamal, and Curtis Mallet generation of firms, sits on three decades of negotiated work product, sovereign retainers, and family-office matters. That archive is the firm. AI that improves how associates search it, draft from it, and translate against it is the largest productivity move available in 2026. AI that exfiltrates it to a foreign cloud is the largest privilege risk. This piece is about how to capture the first without buying the second, the architecture, the hardware, and a realistic first six months.
Privilege, confidentiality, cloud LLMs, a non-starter
The legal landscape on public-cloud AI moved fast in early 2026. In United States v. Heppner on 17 February 2026, Judge Jed Rakoff of the Southern District of New York held that materials a defendant created using consumer AI tools were not protected by attorney-client privilege or work-product doctrine, because submitting them to the tool was a third-party disclosure, as summarised by Morgan Lewis. A week earlier the Eastern District of Michigan in Warner v. Gilbarco held that AI use directed by counsel for litigation preparation can remain work-product protected. The line is the firm's control. Counsel-directed, perimeter-bounded, logged use survives. Free-tier consumer chatbot use does not.
The Omani Bar has not yet ruled on this question, but the conservative reading of the Personal Data Protection Law issued under Royal Decree 6/2022 places client matter files inside the controller's responsibility. Pasting a draft concession agreement into ChatGPT or Claude transfers the document to a foreign operator subject to the United States CLOUD Act, often retained for service improvement, and accessible to the operator's employees under their security policy rather than the firm's. Three problems compound: privilege on the matter, data-protection obligations toward the client, and conflict-check leakage when the matter description itself reveals an unannounced deal. There is no enterprise wrapper that fully resolves all three on a foreign cloud, because the data still leaves the perimeter.
The on-prem alternative, local Qwen 3.6 plus RAG over the firm corpus
The credible architecture is the same shape that Stanford RegLab evaluated in 2024, retrieval-augmented generation, but served on hardware inside the firm with an Arabic-capable open-weight model. The retrieval stage pulls relevant passages from a vector index of the firm's matter database, prior memoranda, deal precedents, and the public Omani corpus. The generation stage feeds those passages plus the lawyer's question into the model and produces an answer with explicit citations. No retrieved passage above a confidence threshold means the system refuses to answer rather than confabulating one.
Stanford found Westlaw AI and Lexis+ AI hallucinated on 17 to 33 percent of legal queries despite their RAG architectures. The on-premise pattern does not magically erase that ceiling. What it changes is who tunes the threshold and who owns the audit trail. The firm picks the citation-audit rules, samples outputs nightly, and sets the regression threshold above which the knowledge-management lead is paged. Casetext CoCounsel pioneered the consumer version of this pipeline; the on-premise version replaces the vendor cloud with a Hosn appliance and replaces the US case-law corpus with the firm's own work product plus the Omani Sultani Decree corpus.
The model layer is open-weight Qwen 3.6 in 27B or 70B Arabic-tuned variants for primary research and drafting, Gemma 4 with 256K context for whole-decree ingestion, and Falcon Arabic from the UAE Technology Innovation Institute as an Arabic-first alternative. All three run with no outbound traffic. This is the on-premise AI for sovereign institutions pattern applied to a private-sector buyer.
Hardware sizing for 10-, 30-, and 100-lawyer firms
The hardware tiers map to concurrency, not headcount.
- 10-lawyer boutique, Hosn Kernel. A Mac Studio M3 Ultra with 512 GB unified memory runs a 70B model quantised, serves five to ten concurrent users, and indexes a corpus up to a few hundred gigabytes. Fits in a partner's office on a normal power circuit. Right for boutique commercial, family-office, or sovereign-advisory firms.
- 30-lawyer mid-market, Hosn Tower. A workstation with one NVIDIA RTX PRO 6000 Blackwell 96 GB card runs a 70B model at FP8 with 26 GB of headroom for KV cache and serves twenty to fifty concurrent lawyers. ECC memory and certified drivers make it production-grade rather than enthusiast hardware.
- 100-lawyer national firm, Hosn Rack. Two to four-GPU servers with H100 or H200 acceleration sit in the firm's server room or a colocation rack the firm controls, behind the firm's firewall, with HSM key custody. Pricing for all three tiers is by quotation because the corpus, the SSO integration, and the air-gap policy vary per firm.
Practical first six months
A realistic phase plan for a thirty-lawyer firm.
- Month 1, scope and corpus inventory. Map practice areas, language mix, and confidentiality classes. Inventory matter files, prior memos, the clause library, and external sources to mirror.
- Month 2, hardware land and air-gap. Tower lands, network isolation, firm SSO, HSM key custody, on-prem vector index built. Mu'een, Oman's national shared AI platform, is referenced as a public-corpus complement for non-confidential queries.
- Month 3, corpus indexing. Ingest the document management system. Mirror the public Sultani Decree corpus. Build article-level chunking and bilingual alignment.
- Month 4, partner pilot. Three to five partners run live matters in parallel with their normal workflow. Citation-audit thresholds set. Hallucination regression dashboard live.
- Month 5, drafting layer. The firm's clause library and house style encoded as adapters. Bilingual drafting templates approved by senior partners.
- Month 6, firm-wide rollout. Associates trained, knowledge-management oversight staffed, monthly partner report on usage, accuracy, and recovered hours.
The output is a research surface that compresses associate hours, a drafting surface that defends house style, and a confidentiality posture the firm can defend to a sovereign client. If you lead an Omani firm and you are evaluating where AI fits without compromising privilege, email [email protected] for a one-hour briefing. We will walk through the architecture, the corpus pipeline, and a phased plan sized to your bench.
Frequently asked
Does pasting a draft into ChatGPT really waive attorney-client privilege?
In the United States, Judge Jed Rakoff's February 2026 ruling in United States v. Heppner held that disclosing information to a public AI tool constitutes third-party disclosure that destroys privilege. The Omani Bar has not issued a parallel ruling, but the analysis maps cleanly. Once a privileged document leaves the firm to a foreign operator under terms that allow training, retention, and government access, the firm cannot argue it kept the matter confidential. The conservative posture is to assume privilege is at risk on every public-cloud AI session.
What hardware does a 30-lawyer Omani firm need to run an on-premise LLM?
A single Hosn Tower with one NVIDIA RTX 6000 Blackwell card and 96 GB of GPU memory runs a 70B-class model in FP8 with full RAG against a multi-gigabyte firm corpus and serves twenty to fifty concurrent lawyers. A ten-lawyer boutique can sit on a Hosn Kernel built around a Mac Studio M3 Ultra. A national firm with a hundred lawyers and a litigation department moves to a Rack with H100 or H200 acceleration. The sizing follows concurrency, not headcount.
Can the model be put under legal hold for a litigation matter?
Yes. Because every prompt, every retrieved passage, and every generation is logged on disks the firm controls, the matter team can preserve a snapshot of the index, the model weights, and the inference logs at the moment a hold is issued. With a public cloud assistant the firm depends on a vendor process the opposing side can challenge. With on-premise the audit trail is a file system the firm produces directly to the court.
How does this differ from Casetext CoCounsel or Westlaw AI?
The vendor tools are RAG over a US case-law and statutes corpus, served from the vendor's cloud. Stanford RegLab's 2024 study found Westlaw AI and Lexis+ AI hallucinated on 17 to 33 percent of legal queries despite RAG architectures. The Omani on-premise pattern differs in three ways: the corpus is the firm's own work product plus the Omani public corpus in Arabic and English, the inference stays inside the perimeter, and the firm sets the citation-audit threshold rather than accepting a vendor SLA.