Why does Arabic register need a different prompt strategy than English?

Arabic is diglossic: Modern Standard Arabic, the Omani-formal sub-register used in ministerial correspondence, and everyday spoken Khaleeji are three distinct varieties. A bare instruction in English produces fluent MSA news prose by default, which lands one or two notches below the register a ministry cabinet actually uses. The model needs explicit register anchors and few-shot exemplars in the target sub-register to settle into the right voice.

What are the most common register failures to watch for?

Four recurring failures: Levantine intrusions like بنين or طفشت leaking from training data, news-wire tone replacing ministerial diction, English-calque sentence shapes (lead with the verb instead of the noun phrase), and Latin tokens like model names or acronyms breaking RTL bidi when followed by Arabic punctuation. Each has a deterministic fix, listed in the article.

How many few-shot exemplars are enough for ceremonial register?

Three to five exemplars cover most institutional registers. Ceremonial drafting (royal decrees, condolences, congratulations) sits at the higher end. Administrative correspondence and technical memos settle quickly with three. Past about seven exemplars the gain flattens and prompt cost rises; beyond that, a small LoRA fine-tune on the secretariat corpus pays back faster than longer prompts.

Should the system prompt itself be in Arabic or English?

Arabic, when the output is Arabic. Mixing system instructions in English with target output in Arabic measurably increases register drift on Qwen and Gemma open-weight models. Keep the system prompt in MSA, anchor the register with one paragraph of style description, then attach exemplars in the same register. Reserve English only for tool-call schemas the model never reads as natural language.

Prompt Engineering for Formal Arabic Government Tone, Hosn Blog

An Omani ministry cabinet does not draft like a newspaper, and a newspaper does not draft like a Royal Court secretariat. Each register sits in a narrow lane, and a model that misses the lane produces text the institution will not sign. Open-weight models like Qwen 3.6 and Gemma 4 can hit any of those lanes accurately, but only when the prompt explicitly anchors the register, supplies in-domain exemplars, and blocks the failure modes that leak in from training data. This piece is the field guide we use when standing up sovereign drafting assistants for ministries, secretariats, and regulators in Oman.

Why Arabic register matters more than English

Arabic is diglossic in a way English is not. Linguists describe at least three coexisting layers: the Classical and Modern Standard register used in writing and formal speech, the regional vernacular spoken at home, and intermediate "educated spoken" varieties that mix the two. Stanford's overview of Arabic diglossia describes the gap as functionally similar to two related languages sharing a script. Inside MSA itself, ministerial-formal Omani prose is a further sub-register, with its own opening invocations, honorifics, and statutory citation style. A model that produces "good MSA" can still produce text that an Omani undersecretary will not sign.

Public-facing models default to news-wire MSA because that is the densest portion of their pretraining mix. That voice is fluent, neutral, and one or two notches below ministerial diction. It is also stylistically Levantine in many releases, because the Arabic web tilts that way. For an Omani buyer, the practical implication is that Levantine intrusions like بنين or طفشت leak into drafts, and Egyptian sentence rhythms appear where Khaleeji ones belong. The fix is not a better model; it is a better prompt. Surveys on style transfer between Arabic dialects and MSA document the same pattern: register signals dominate model output unless the prompt actively pins the target.

Anchor patterns: ceremonial, administrative, technical

Every institutional register can be reduced to a system-prompt skeleton plus three to five exemplars. The skeleton names the register, names the audience, names the forbidden registers, and names the structural conventions. Below are the three skeletons we keep in production.

Ceremonial register. Used for royal speeches, decrees, condolence and congratulation cables, and bilateral palace correspondence. The system prompt opens in Arabic, names the register as السجل الاحتفالي الرسمي, lists three forbidden styles (news-wire, Levantine, English-calque), and instructs the model to open every draft with the appropriate religious or sovereign invocation. Exemplars are drawn from the secretariat's own corpus where possible, otherwise from published royal speeches.

Administrative register. Used for ministerial circulars, inter-ministry memoranda, undersecretary correspondence, and procurement letters. The skeleton names the register as السجل الإداري الرسمي, requires noun-phrase openings (subject, then verb), enforces the Hijri-and-Gregorian date pair, and lists the standard closing formulas. Three exemplars usually settle the model.

Technical register. Used for regulatory guidance notes, standards documents, and ministry-issued technical specifications. The skeleton names the register as السجل التقني الرسمي, allows controlled English-loan terms in parentheses on first mention, requires consistent statutory citation form, and disables the ceremonial invocations that would feel out of place in a standards document. The few-shot Arabic prompt literature, including the Arabic instruction-tuning datasets released since 2024, shows the same skeleton applies cleanly across Qwen, Gemma, and Falcon-Arabic backends.

Common register failures and fixes

Four failures account for almost every rejected draft. Each has a deterministic fix that belongs in the system prompt rather than in post-edit.

Levantine intrusion. Words like بنين, صبية, طفشت appear in drafts intended for Omani readers. Fix: an explicit forbidden lexicon block in the system prompt, plus a one-line instruction to prefer اللاعبين, المشاركين, تعبت in the equivalent slots.
News-wire drift. Drafts open with verbs and read like agency copy. Fix: instruct the model to lead with the noun phrase that names the issuing authority, and to place the action verb in second position. Add one ceremonial exemplar to break the pattern decisively.
English-calque syntax. Sentences that mirror English clause order, often with Anglicised conjunction usage. Fix: require the model to think in جملة اسمية structure for openings, and add a single negative exemplar showing what to avoid.
Bidi breakage on Latin tokens. Acronyms and model names like PDPL or Qwen, when followed by Arabic punctuation, render in the wrong direction. Fix: a one-line instruction in the system prompt to wrap every Latin run in <bdi dir="ltr"> tags before the trailing Arabic punctuation.

None of these requires fine-tuning. They are prompt-level fixes that pay back from the first call.

A templating system for institutional consistency

A working drafting assistant is not a single prompt; it is a small library. Each institutional register gets a versioned template, the template carries its forbidden-lexicon list, and every prompt the model sees is rendered from a template plus the live request. The library is owned by the institution, stored alongside the model weights, and audited the same way style guides are audited today. New registers (a new ministry, a new bilateral programme) are added as new templates rather than as patches to existing ones.

The benchmarking story for the underlying model still matters, of course; templates ride on top of model quality. The pillar reference for that side of the choice is Qwen 3.6 Arabic NLP, which sets out the evaluation suites and serving sizing for the most common open-weight Arabic backend in 2026. Pair a strong backend with a disciplined template library and the institution gets ministerial output the first time, not the third.

If your secretariat or ministry is sizing a drafting assistant against this shape of brief, the next step is a one-hour briefing on prompt architecture, register anchors, and the templating workflow. Email [email protected] or message +968 9889 9100. Pricing is by quotation, sized to the institutional scope.

Why Arabic register matters more than English

Anchor patterns: ceremonial, administrative, technical

Common register failures and fixes

A templating system for institutional consistency

Frequently asked

Related

Qwen 3.6 for Arabic NLP: Benchmarks, Strengths, and Production Deployment

Bilingual AI for Diwan and Royal Court Correspondence

Arabic Instruction-Tuning Datasets for Sovereign Deployments