Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis

📅 2026-04-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

188K/year
🤖 AI Summary
This work addresses limitations in existing ASR-based mispronunciation detection methods, which struggle to capture transient pronunciation deviations due to the coarse-grained nature of CTC alignment and suffer from prediction bias when explicitly incorporating canonical pronunciation priors. To overcome these issues, the authors propose a prompt-free decoupled framework that separates acoustic modeling from canonical pronunciation guidance. Specifically, they introduce the CROTTC model to achieve frame-level monotonic alignment for precise deviation localization and integrate an implicit feedback (IF) strategy to inject mispronunciation knowledge without inducing linguistic bias. Evaluated on the L2-ARCTIC and Iqra'Eval2 datasets, the proposed approach achieves F1-scores of 71.77% and 71.70%, respectively, demonstrating significantly improved robustness and accuracy in mispronunciation detection.

Technology Category

Application Category

📝 Abstract
Mispronunciation Detection and Diagnosis (MDD) requires modeling fine-grained acoustic deviations. However, current ASR-derived MDD systems often face inherent limitations. In particular, CTC-based models favor sequence-level alignments that neglect transient mispronunciation cues, while explicit canonical priors bias predictions toward intended targets. To address these bottlenecks, we propose a prompt-free framework decoupling acoustic fidelity from canonical guidance. First, we introduce CROTTC, an acoustic model enforcing monotonic, frame-level alignment to accurately capture pronunciation deviations. Second, we implicitly inject mispronunciation information via the IF strategy under the knowledge transfer principle. Experiments show CROTTC-IF achieves a 71.77% F1-score on L2-ARCTIC and 71.70% F1-score on the Iqra'Eval2 leaderboard. With empirical analysis, we demonstrate that decoupling acoustics from explicit priors provides highly robust MDD.
Problem

Research questions and friction points this paper is trying to address.

Mispronunciation Detection and Diagnosis
Acoustic Sparsity
Linguistic Bias
CTC-based Models
Canonical Priors
Innovation

Methods, ideas, or system contributions that make the work stand out.

prompt-free
mispronunciation detection
frame-level alignment
acoustic decoupling
implicit feedback
🔎 Similar Papers
No similar papers found.