AI for Healthcare in the Sultanate: Arabic Medical Scribing and Patient Notes
An outpatient consultant at a tertiary referral hospital in Muscat sees 28 patients between 8am and 2pm. After clinic, she stays until 5:30pm to finish notes from memory. Her colleague in radiology dictates between scans into a Dictaphone whose transcripts come back two days later, full of mistransliterated Arabic drug names that he has to fix line by line. Multiply this by the consultant body of an Omani Ministry of Health hospital and you arrive at the single biggest, least dramatic crisis in modern medicine: clinicians spend one to two hours every working day on documentation rather than care. AI scribing changes that math. This guide walks through how, specifically for an Arabic-first, Omani context.
This is the pillar guide for healthcare. It covers the burden problem, what ambient clinical intelligence actually does, why Arabic and on-premise are non-negotiable for Omani hospitals, the technical pipeline (Whisper Large V3 plus Qwen 3.6 or Falcon Arabic), hardware sizing for the Sultanate's tertiary hospitals, validation and hallucination control, the regulatory posture under Ministry of Health policies and Royal Decree 6/2022, and a phased deployment plan that starts in one specialty and scales hospital-wide.
The clinician burden problem
Documentation is the single largest non-clinical workload in modern medicine. The widely cited Annals of Internal Medicine time-and-motion study found US ambulatory physicians spending nearly two hours on the electronic medical record and clerical work for every hour of direct patient contact, with an additional one to two hours of after-hours documentation. Subsequent national surveys correlate this load directly with burnout, reduced clinical capacity, and intent to leave practice.
The Omani picture is similar in shape, harder in language. A Ministry of Health consultant routinely consults in Arabic, switches into English for clinical terminology, drug names, and dose units, and then types the resulting note into an EMR that expects English structured fields. Junior doctors absorb this translation overhead. Senior consultants offload it to scribes when scribes are available, and to evening hours when they are not. Either way, time that could be spent on the next patient or on teaching becomes typing time.
The arithmetic of even a modest scribe deployment is significant. If a Tower-class system saves a consultant 60 minutes per day across a five-day clinic week, that is 250 hours per year per consultant freed for clinical work. Across a 200-consultant tertiary hospital, that approaches 50,000 clinical hours per year, the rough equivalent of 25 additional full-time consultant posts at no headcount cost. The same multiplier applies in radiology reporting, ED documentation, and inpatient progress notes.
What ambient clinical intelligence does
Ambient clinical intelligence (ACI) is the technical name for a system that listens passively to a clinical encounter, transcribes the conversation, and converts the transcript into a structured medical note that a clinician reviews, edits, and signs. The category was popularised commercially by Nuance's DAX product (now part of Microsoft) and by newer entrants such as Abridge and Suki, all serving large US health systems.
The flow at a typical encounter is straightforward. The clinician opens an encounter on a tablet or phone and presses record. The microphone captures the conversation between clinician and patient. The audio streams to a transcription model that produces speaker-separated text in near real time. As the encounter ends, a language model takes the verbatim transcript and produces a structured note in the institution's preferred format, usually a SOAP note (Subjective, Objective, Assessment, Plan) plus a problem list, medication list, allergies update, and follow-up instructions. The clinician opens the draft, edits as needed, and signs. The signed note enters the EMR. Audio is discarded.
What changes is not what the doctor does, but where the cognitive load lands. Instead of dividing attention between the patient and a screen, the clinician can give the encounter undivided attention, then spend two to three minutes reviewing a draft rather than fifteen minutes constructing one from memory. Studies of ACI deployments in US health systems consistently report meaningful reductions in after-hours documentation and improvements in clinician-reported burnout, while clinical quality remains at least equivalent.
Why Arabic-first Oman demands on-premise
Three properties make Oman's healthcare context different from the US deployments that dominate the literature, and all three push hard toward on-premise rather than cloud.
Patient data residency. Patient health information is among the most sensitive personal data categories under Oman's Personal Data Protection Law, Royal Decree 6/2022, fully enforceable since 5 February 2026. Cross-border transfer is conditioned on safeguards. Sending consultation audio to a foreign cloud transcription service, even encrypted, creates a regulatory and reputational exposure that no medical director will sign off lightly. On-premise transcription removes the question entirely.
Arabic medical vocabulary and dialect. Generic transcription services are tuned for Modern Standard Arabic and high-volume dialects (Egyptian, Levantine). Omani clinical Arabic is a specific code-switching register: MSA for formal phrasing, Gulf dialect for patient narrative, English for medication names, dosing, and ICD codes. A model trained generically on Arabic web text mishears drug names that a local model fine-tuned on hospital data gets right ninety-five times in a hundred. The accuracy gap is not academic. It is the difference between a useful assistant and an unsafe one.
Integration with sovereign EMR systems. The Ministry of Health, the Royal Hospital, the Sultan Qaboos University Hospital, and the major teaching hospitals run electronic medical record platforms with internal interfaces, often customised. A scribe system has to write into those EMRs through the institution's own integration layer, behind its own firewall, using its own authentication. Cloud SaaS scribes are not architected for this. On-premise systems are.
Open-weight Arabic models reach the quality bar that this set of constraints implies. Falcon Arabic from the Technology Innovation Institute in Abu Dhabi tops the Open Arabic LLM Leaderboard at the 3B, 7B, and 34B scales. Qwen 3.6 covers two hundred languages including dialect variants and supports the kind of code-switching that real Omani clinical speech contains. Both models can run fully air-gapped on hospital-owned hardware.
The technical pipeline
The reference architecture for an Arabic medical scribe has four stages, all running on the institution's hardware, with audio never leaving the perimeter.
Stage 1, capture. The clinician's tablet, phone, or desk microphone records the encounter. Audio is encrypted in transit to the on-premise inference cluster on the hospital network. Recording starts and stops are explicit clinician actions, never automatic.
Stage 2, transcription. The audio enters Whisper Large V3, OpenAI's open-weight automatic speech recognition model. Whisper handles 99 languages including Arabic and produces speaker-tagged text. For a hospital deployment, the base model is supplemented by a domain biasing list of medications (the institution's formulary), anatomy, common ICD codes, and consultant names, which raises accuracy on the items that matter most. Faster Whisper or whisper.cpp implementations on GPU run at five to ten times real-time, so a 20-minute consultation transcribes in roughly two to four minutes.
Stage 3, structured note generation. The verbatim transcript goes to an Arabic-capable language model, typically Qwen 3.6 (general medical work, fast turnaround) or Falcon Arabic 34B (when Arabic correctness is the dominant requirement, for example in cardiology dictation or hospital discharge summaries that go directly to patients). The model receives a system prompt that defines the institution's note format, the SOAP structure, the requirement to ground every fact in the transcript, and a forbidden-actions list (no diagnoses the clinician did not state, no treatment recommendations beyond what was discussed, explicit flagging of any item that could not be sourced).
Stage 4, post-processing. The draft note runs through validation: medication doses checked against the formulary, ICD codes validated against the active code set, drug-drug interaction screen, and a side-by-side render of transcript and note for the clinician's review. The clinician edits, signs, and the note enters the EMR through the institution's standard integration layer. Audio and intermediate transcript are discarded according to a documented retention policy, typically immediate or within 24 hours.
Retrieval-augmented generation can be added so the note generation step also pulls the patient's prior history from the EMR (allergies, active medications, last visit summary) and incorporates it into the new note's continuity sections. This is where ambient clinical intelligence stops being a transcription tool and becomes a working clinical assistant.
Hardware sizing
Sizing follows concurrent active dictations, not headcount. A single transcription occupies a model instance for the duration of the consultation; a typical 15 to 25 minute encounter runs through Whisper plus note generation in roughly 4 to 6 minutes of GPU time on modern accelerators. The peak load is mid-clinic, when several rooms are dictating simultaneously.
Pilot tier, single specialty. A Hosn Tower configuration (NVIDIA RTX 6000 Ada or Blackwell, 96 GB GPU memory, high-clock CPU host) handles 40 to 60 concurrent active transcriptions with note generation, the right capacity for a single outpatient department running 8 to 12 simultaneous clinics. This is the natural pilot footprint for one specialty before scaling.
Departmental tier, multi-specialty. Two or three Towers in a load-balanced cluster, or a small Rack with two H100 or H200 accelerators, serves 100 to 200 concurrent encounters across multiple departments. This is the right tier for a regional hospital or the larger ambulatory wings of a tertiary centre.
Tertiary referral hospital tier. A Hosn Rack with four to eight H100 or H200 accelerators, NVMe storage in the tens of terabytes for transcripts and audio buffers, and redundant power, supports 300 to 600 concurrent encounters with headroom for retrospective batch processing (overnight re-transcription, fine-tuning runs, departmental analytics). At this scale, separate model pools handle different specialties (radiology dictation has different latency requirements than outpatient SOAP notes), and the system runs more than one Arabic LLM in parallel routed by task.
The right starting point is one tier below your steady state, with explicit upgrade headroom. Buying tertiary-hospital hardware for a pilot wastes budget. Buying pilot hardware for a tertiary deployment creates queueing failures the moment the system gets busy.
Validation and clinician-in-loop
Validation is the discipline that makes ambient clinical intelligence safe enough to deploy. Three layers matter.
Layer 1, pre-deployment evaluation. Before live use, transcription accuracy is measured on a labelled sample of 100 to 500 redacted consultations across the planned specialties. Word error rate (WER) for general speech, plus a separate medical-term WER specifically for medications and anatomy, gives a credible accuracy floor. Note quality is evaluated by a panel of consultants against agreed criteria (faithfulness to transcript, structural correctness, completeness, omission of inappropriate inferences) on a separate sample. The deployment proceeds only when both metrics meet the agreed threshold.
Layer 2, runtime guards. Hallucination control is built into the prompt and post-processing. The note generator is instructed to cite the transcript line for every clinical fact and to flag any item not present. The post-processor cross-checks medication doses against the formulary, validates ICD codes, and runs a drug interaction screen. The clinician sees a side-by-side view of transcript and draft note. Any unverifiable item is highlighted, not silently included.
Layer 3, clinician sign-off. No note enters the EMR without explicit clinician signature. The signature is a deliberate UI action, not a default. The clinician remains the legal author of the medical record. The AI is a draft generator, not a co-signer. This pattern is what positions the system as clinical documentation support rather than a Software as a Medical Device.
Continuous monitoring catches drift. A small, ongoing audit sample (1 to 2 percent of notes) is reviewed against transcript by a senior consultant every month, with results feeding back into prompt tuning, biasing-list updates, and, if warranted, model retraining.
Regulatory posture
Three regulatory surfaces apply to an Arabic medical scribe in Oman, and the posture is workable for institutions that engage early.
The first is the Ministry of Health framework on medical records, clinical governance, and confidentiality. A clinician-assist documentation tool that produces drafts requiring physician signature operates within existing record-keeping policies and does not require a new licensing category. What the institution does need to demonstrate is a documented clinical governance pathway: who validated the system, how WER and quality were measured, what the escalation path looks like when output is wrong, and how clinician training is delivered.
The second is Royal Decree 6/2022, Oman's Personal Data Protection Law published by MTCIT. Patient health information is sensitive personal data. On-premise processing inside the hospital satisfies the law by construction, since no cross-border transfer is involved. The institution still maintains its own DPIA (data protection impact assessment) covering audio capture, retention, access control, and clinician accountability.
The third is the line between documentation tool and Software as a Medical Device. Pure clinician-assist scribing (transcription plus structured note draft) sits on the documentation side. The moment a system starts producing autonomous diagnostic suggestions, treatment recommendations, or risk scores influencing clinical decisions, it crosses into SaMD territory and needs to be evaluated against the Ministry of Health medical device pathway and against international references such as the FDA AI/ML-enabled medical device guidance. A staged deployment plan keeps these features explicitly out of scope in phase one and only adds them after a separate validation cycle.
Phased deployment
The pattern that works for an Omani tertiary hospital starts in one specialty and grows from there. A typical timeline runs 16 to 24 weeks to first specialty go-live, then incremental rollouts.
Weeks 1 to 4 are scoping and consent. The clinical governance committee chairs the conversation, the chief medical informatics officer sits in, IT and security review the architecture, and one anchor specialty is chosen (often outpatient general medicine, family medicine, or radiology dictation, where the workload is high and the language patterns are tractable). Patient consent language for the recording flow is drafted and approved.
Weeks 4 to 10 are infrastructure and integration. Hardware lands in the hospital data centre, the OS and inference stack are hardened, EMR integration goes through the institution's standard change-management pathway, identity provider linkage is wired, and the user interface is localised into Omani Ministry of Health terminology.
Weeks 10 to 16 are evaluation. The system runs in shadow mode on consenting clinicians: it generates notes that are not entered into the EMR, the clinician compares them to their own notes, and a panel of consultants scores quality on a labelled sample. Word error rate, medical term WER, note faithfulness, and clinician satisfaction are all measured. The bias list is tuned and the prompt is refined.
Weeks 16 to 24 are go-live in the anchor specialty. A small group of trained clinicians moves to live use, clinician sign-off becomes mandatory before EMR entry, and the audit sample begins. Quality, time-saved, and after-hours documentation are tracked weekly for the first quarter.
Subsequent specialties roll out at four-to-six-week increments. The Tower or Rack capacity is sized at the start so additional specialties join without re-procurement. Mu'een, Oman's national shared-AI platform, can complement the institutional system as a national reference for cross-hospital terminology standards. Within twelve to eighteen months a hospital-wide deployment is routine clinical infrastructure, not a project.
If your hospital, ministry, or healthcare operator is evaluating an Arabic medical scribe and wants a one-hour briefing tailored to your specialty mix, EMR, and concurrency targets, the next step is straightforward. Email [email protected] or message +968 9889 9100. We will come to you in Muscat or anywhere in Oman, walk through the pipeline against your real clinical flows, and produce a written sizing and timeline. Pricing is by quotation, scoped to your concurrency and integration depth.
Frequently asked
Can the system understand Omani Arabic dialect during a consultation?
Yes, with adaptation. Whisper Large V3 covers Modern Standard Arabic and the broad Gulf dialect cluster out of the box, but clinical Omani conversations often code-switch into English medical terminology and back, and use locality-specific descriptors for symptoms. The deployment pattern is: start with Whisper Large V3, evaluate accuracy on a labelled sample of 50 to 100 redacted consultations, then fine-tune the model and add a domain biasing list of medications, anatomy, and ICD codes used at the institution. Word error rate typically lands below 8 percent for structured consultation speech after this step.
Does the AI replace the clinician's note, or assist it?
It assists. The clinician remains the legal author of the medical record. The AI produces a structured draft (SOAP, problem list, medications, follow-up) within seconds of the visit ending. The clinician reviews, edits, and signs. No note enters the electronic medical record without an explicit clinician signature event. This is the same pattern used by Abridge, Suki, and Nuance DAX in the United States and is consistent with how regulators frame AI-assisted documentation.
How is patient privacy protected?
Audio and notes never leave the hospital perimeter. The Whisper transcription model and the Arabic language model run on hospital-owned hardware. Audio is processed in memory or on encrypted ephemeral storage and discarded after the structured note is generated and signed. The system integrates with the hospital's identity provider so access is governed by the same role-based controls as the EMR. This posture aligns with Oman's Personal Data Protection Law (Royal Decree 6/2022) and with Ministry of Health expectations on medical record confidentiality.
How are hallucinations controlled in a high-stakes setting?
Through a chain of constraints. The transcription stage produces a verbatim transcript that the model is instructed to ground in. The note generation prompt forbids facts not present in the transcript and asks the model to flag any item it could not source. A post-processing step checks medication doses against a formulary, flags drug interactions, and validates ICD codes against the active code set. The clinician reviews a side-by-side view of transcript and draft note. Any unverifiable claim is highlighted for explicit confirmation.
What hardware does a tertiary referral hospital need?
It depends on concurrent dictation load. A Hosn Tower configuration with 96 GB of GPU memory comfortably serves around 50 concurrent active transcriptions plus note generation, suitable for a busy outpatient clinic or a single department. A Hosn Rack with multiple H100 or H200 accelerators is the right tier for a tertiary hospital running 200 to 500 concurrent encounters across departments, with capacity headroom for fine-tuning runs and retrospective batch processing.
What is the regulatory category in Oman for this kind of system?
An AI medical scribe operating in clinician-assist mode (no autonomous diagnostic or treatment output) is a documentation tool, not a Software as a Medical Device. The relevant compliance surface is Ministry of Health policies on medical records, Royal Decree 6/2022 on personal data, and the institution's own clinical governance framework. If the same vendor later adds diagnostic suggestion features, those modules cross into SaMD territory and need to be evaluated against the Ministry of Health medical device regulations and international references such as the FDA AI/ML guidance.