Fine-Tuning Gemma 4 on an Omani Legal Corpus

An Omani sovereign legal team that wants an LLM to draft, summarise, and cite local law cannot rely on a generic open-weight model. The vocabulary of Sultani Decrees, the structure of ministerial decisions, and the citation idiom of the Court of Cassation are absent from any frontier pre-training mix. The good news: a thin LoRA adapter on Gemma 4, trained on a properly curated Omani legal corpus, closes the gap on a single H100 in under a day. This article is the practical recipe, the same recipe that sits behind our broader pillar on LoRA QLoRA on-premise fine-tuning. Everything happens inside the perimeter.

The Omani legal corpus

A credible training set draws from five primary sources. Each has its own voice, register, and citation pattern, and the adapter must learn to respect all five.

  • Sultani Decrees (المراسيم السلطانية) issued in the Official Gazette, including their published amendments and explanatory annexes. The base law of the land.
  • Ministerial decisions (القرارات الوزارية) from the Ministry of Justice and Legal Affairs, the Ministry of Commerce Industry and Investment Promotion, the Ministry of Labour, the Capital Market Authority, and the Tax Authority. The operating layer where decrees become rules.
  • Court of Cassation rulings (أحكام المحكمة العليا) and their annual principle digests. The interpretive layer that fixes what the text actually means in practice.
  • Regulator circulars from the Central Bank of Oman, the Authority for Public Services Regulation, and the National Centre for Statistics and Information. Sector-specific operating guidance.
  • Counsel-drafted memoranda contributed by the institution itself, anonymised, with explicit permission. The bridge from law to practice for that specific institution.

For Omani buyers, the canonical front door to decrees and ministerial decisions is the Qanoon legislative portal maintained by the Ministry of Justice and Legal Affairs. Always pull the official PDFs and gazette references rather than scraped third-party copies.

Data preparation

Three disciplines decide whether the adapter is worth promoting.

Deduplication. Decrees, amendments, and consolidated reissues create heavy near-duplicate clusters. Run MinHash and Jaccard at the paragraph level, keep the latest consolidated version as the canonical example, and retain the prior versions only where the adapter must learn the amendment relationship explicitly. Without this step, training loss collapses on a tiny minority of repeated paragraphs and the adapter never learns the long tail.

Decree-vs-summary tagging. The corpus must be paired. Every long source becomes at least three training examples: the verbatim text, a faithful summary, and a question that demands a cited answer with paragraph anchor. Tag each pair with structured metadata (source_type, issue_year, arabic_register, citation_form) so the trainer can re-balance the mix. The balance that has held up in our internal runs is roughly 40% verbatim, 30% summary, 30% cited Q&A.

Arabic OCR cleanup. Older gazette PDFs are scans, not native PDF text. Run a modern Arabic-aware OCR pipeline, then a deterministic post-processor that fixes the common artefacts: hamza variants on alif (إ، أ، آ، ا), tah marbuta vs ha (ة، ه), inverted paragraph numbering, and broken kashida runs. Arabic legal text without OCR cleanup poisons the adapter, the model learns the OCR errors as if they were canonical Arabic.

LoRA recipe on Gemma 4

The recipe below has worked reliably for our internal runs and tracks the official Hugging Face PEFT recipe for Gemma. Use Gemma 4 26B-A4B for legal-research throughput, Gemma 4 31B dense for the most rigorous correspondence, and reserve the 100B-A11B variant for evaluation rather than production fine-tuning on classified material.

  • Adapter shape: rank 32 to 64, alpha 64 to 128 (alpha = 2x rank is the safe default), dropout 0.05.
  • Target modules: the full attention plus MLP set, ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]. Omitting the MLP modules costs measurable Arabic legal accuracy.
  • Learning rate: 2e-4 for SFT on the 26B-A4B variant, 1e-4 for the 31B dense. Linear warmup over 3% of steps, cosine decay after.
  • Sequence length: 16K tokens for the bulk of the run, with a final epoch at 32K to teach the long-context citation behaviour.
  • Optimiser: paged_adamw_8bit with gradient accumulation 8 to 16. Effective batch size 32 to 64 across two epochs is the sweet spot for a 30,000-example corpus.
  • Frameworks: Hugging Face trl SFTTrainer plus peft on PyTorch, optionally orchestrated by axolotl for reproducible YAML-driven runs.

Hardware

One H100 80GB is the practical reference. The 26B-A4B variant in QLoRA at 4-bit base with a BF16 adapter consumes roughly 38 to 46 GB at 16K sequence length, leaving headroom for KV cache and the optimiser state. The 31B dense fits the same machine in QLoRA mode at rank 32. Wall-clock for a 30,000-example corpus at two epochs is roughly 12 to 24 hours, depending on variant and sequence mix. An H200 141GB cuts that by 40 to 50% and lets you push to rank 128 if a particular sub-corpus warrants it. Hosn ships these as Tower and Rack appliances with axolotl, peft, trl, and bitsandbytes pre-installed and air-gap-friendly.

Eval methodology and rollback

An adapter that lawyers can rely on passes three gates before promotion. Automatic suites, ALUE, ArabicMMLU, and a custom in-house legal MCQ set, catch regressions on general Arabic and on legal knowledge respectively. Gold-set RAG evaluation on 500 senior-counsel-written questions, scored on exact citation match plus a four-point rubric, catches reasoning regressions. Adversarial red-team, asking the adapter to fabricate decrees, invent case numbers, or summarise repealed laws, sets the hard floor: the fabrication rate must be near zero before promotion to live.

Treat every adapter as a versioned, signed asset. Pin the base Gemma 4 weights by SHA-256, sign the adapter with the institution's HSM, store the pair under a documented rollback path, and keep the previous adapter hot-swappable. When a fresh decree arrives or a ruling overturns a precedent, you re-train, re-evaluate, and roll forward. If the new adapter regresses on any gate, you roll back in seconds, not days. That is what sovereign fine-tuning looks like in practice.

If your legal directorate is sizing a Gemma 4 fine-tuning programme on Omani sources and would like a one-hour briefing tailored to your corpus, classification posture, and hardware envelope, the next step is simple. Email [email protected] or message +968 9889 9100. We will come to you, walk through the recipe, and leave a credible plan against your timeline. Pricing is by quotation, sized to your specific requirement.

Frequently asked

Why fine-tune Gemma 4 instead of Falcon Arabic for an Omani legal corpus?

Gemma 4 wins on context length. Its 256K window on the 26B-A4B and 31B variants comfortably ingests an entire Sultani Decree plus its amendments, the relevant ministerial decision, and the Court of Cassation ruling that interpreted it, in one prompt. Falcon Arabic remains stronger on classical Arabic comprehension and sharia-only retrieval, and many Omani legal teams run both. The pragmatic answer is to fine-tune Gemma 4 as the default legal-research workhorse and keep Falcon Arabic adapters in the same appliance for sharia review and classical-Arabic case law.

Can a single H100 80GB really fine-tune Gemma 4 on Omani legal data?

Yes. The 26B-A4B mixture-of-experts variant accepts QLoRA at 4-bit base with a BF16 LoRA adapter on a single H100 80GB. The 31B dense variant fits the same machine in QLoRA mode with rank 32 to 64. Realistic training time for a 30,000-example Omani legal corpus at 16K context is 12 to 24 hours per adapter on one H100, with another 4 to 8 hours of evaluation and review. The H200 variant cuts that roughly in half. No multi-node cluster, no cloud, no exfiltration.

What is the minimum size of a usable Omani legal training corpus?

Aim for 10,000 to 50,000 deduplicated, paired examples. Below 10,000, the adapter learns vocabulary and tone but not legal reasoning structure. Above 50,000, returns diminish for a single jurisdiction. The recipe blends Sultani Decrees and their published amendments, ministerial and regulator decisions, Court of Cassation principle digests, official gazette annexes, and curated counsel-drafted memoranda. Pair every long source with a faithful summary, an issue-spotting prompt, and a citation-with-paragraph-anchor variant. Quality of the pairing matters more than raw corpus size.

How do we evaluate the fine-tuned adapter before letting lawyers use it?

Run three layers of evaluation. First, automatic Arabic NLP suites: ALUE, ArabicMMLU, and a custom in-house legal MCQ set built from past examination questions. Second, retrieval-augmented question answering against a held-out 500-question gold set written by senior counsel, scored by exact citation match plus expert rubric. Third, an adversarial red-team pass that asks the adapter to fabricate decrees, invent case numbers, or summarise repealed laws, the failure rate must be near zero before promotion. Every adapter version is signed, dated, and kept under a documented rollback path.