Context-Aware Whisper for Arabic ASR Under Linguistic Varieties

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor robustness, hallucination, and speaker mismatch in low-resource Arabic automatic speech recognition (ASR) — stemming from dialectal diversity and scarce labeled data — this paper proposes a fine-tuning-free, context-aware prompting framework tailored for Whisper. Our method innovatively integrates decoder-side prompting with encoder-side prefix tuning, incorporating speaker-aware synthetic speech prefixes. It further leverages acoustic-semantic-lexical multimodal retrieval, prompt re-ranking, and zero-shot prompt learning to enable cross-dialect adaptation. Evaluated across nine Arabic dialects and Modern Standard Arabic (MSA), the framework achieves up to a 22.3% relative reduction in MSA word error rate (WER) and an average 9.2% WER reduction on dialectal speech. Results demonstrate substantial improvements in recognition stability and generalization under realistic, resource-constrained conditions.

Technology Category

Application Category

📝 Abstract
Low-resource ASR remains a challenging problem, especially for languages like Arabic that exhibit wide dialectal variation and limited labeled data. We propose context-aware prompting strategies to adapt OpenAI's Whisper for Arabic speech recognition without retraining. Our methods include decoder prompting with first-pass transcriptions or retrieved utterances, and encoder prefixing using speech synthesized in the target speaker's voice. We introduce techniques such as prompt reordering, speaker-aware prefix synthesis, and modality-specific retrieval (lexical, semantic, acoustic) to improve transcription in real-world, zero-shot settings. Evaluated on nine Arabic linguistic conditions, our approach reduces WER by up to 22.3% on Modern Standard Arabic and 9.2% on dialectal speech, significantly mitigating hallucinations and speaker mismatch.
Problem

Research questions and friction points this paper is trying to address.

Adapting Whisper for Arabic ASR without retraining
Addressing dialectal variation and limited labeled data
Reducing word error rates and mitigating hallucinations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using context-aware prompting without retraining Whisper
Applying decoder prompting with retrieved or transcribed utterances
Employing speaker-aware prefix synthesis and multimodal retrieval
🔎 Similar Papers
No similar papers found.