Context-Aware Whisper for Arabic ASR Under Linguistic Varieties

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

To address poor robustness, hallucination, and speaker mismatch in low-resource Arabic automatic speech recognition (ASR) — stemming from dialectal diversity and scarce labeled data — this paper proposes a fine-tuning-free, context-aware prompting framework tailored for Whisper. Our method innovatively integrates decoder-side prompting with encoder-side prefix tuning, incorporating speaker-aware synthetic speech prefixes. It further leverages acoustic-semantic-lexical multimodal retrieval, prompt re-ranking, and zero-shot prompt learning to enable cross-dialect adaptation. Evaluated across nine Arabic dialects and Modern Standard Arabic (MSA), the framework achieves up to a 22.3% relative reduction in MSA word error rate (WER) and an average 9.2% WER reduction on dialectal speech. Results demonstrate substantial improvements in recognition stability and generalization under realistic, resource-constrained conditions.

Technology Category

Application Category

📝 Abstract

Low-resource ASR remains a challenging problem, especially for languages like Arabic that exhibit wide dialectal variation and limited labeled data. We propose context-aware prompting strategies to adapt OpenAI's Whisper for Arabic speech recognition without retraining. Our methods include decoder prompting with first-pass transcriptions or retrieved utterances, and encoder prefixing using speech synthesized in the target speaker's voice. We introduce techniques such as prompt reordering, speaker-aware prefix synthesis, and modality-specific retrieval (lexical, semantic, acoustic) to improve transcription in real-world, zero-shot settings. Evaluated on nine Arabic linguistic conditions, our approach reduces WER by up to 22.3% on Modern Standard Arabic and 9.2% on dialectal speech, significantly mitigating hallucinations and speaker mismatch.

Problem

Research questions and friction points this paper is trying to address.

Adapting Whisper for Arabic ASR without retraining

Addressing dialectal variation and limited labeled data

Reducing word error rates and mitigating hallucinations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using context-aware prompting without retraining Whisper

Applying decoder prompting with retrieved or transcribed utterances

Employing speaker-aware prefix synthesis and multimodal retrieval

🔎 Similar Papers

M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper