Towards Accurate Phonetic Error Detection Through Phoneme Similarity Modeling

📅 2025-07-18

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Current speech recognition models exhibit limited capability in phoneme-level modeling of pronunciation deviations—such as accents and disfluencies—thereby constraining the accuracy of automatic pronunciation assessment. To address this, we propose an end-to-end approach integrating multi-task learning with explicit phoneme similarity modeling, enabling fine-grained characterization of discrepancies between actual and canonical pronunciations. We construct and publicly release VCTK-accent, the first synthetic dataset specifically designed for pronunciation error modeling. Additionally, we introduce two novel metrics for quantifying pronunciation divergence. Experiments demonstrate that our method significantly improves phoneme-level transcription accuracy, particularly under non-native or atypical pronunciation conditions, and exhibits enhanced robustness compared to prior approaches. Our work establishes a new benchmark for pronunciation error detection and advances the state of the art in automatic pronunciation assessment.

Technology Category

Application Category

📝 Abstract

Phonetic error detection, a core subtask of automatic pronunciation assessment, identifies pronunciation deviations at the phoneme level. Speech variability from accents and dysfluencies challenges accurate phoneme recognition, with current models failing to capture these discrepancies effectively. We propose a verbatim phoneme recognition framework using multi-task training with novel phoneme similarity modeling that transcribes what speakers actually say rather than what they're supposed to say. We develop and open-source extit{VCTK-accent}, a simulated dataset containing phonetic errors, and propose two novel metrics for assessing pronunciation differences. Our work establishes a new benchmark for phonetic error detection.

Problem

Research questions and friction points this paper is trying to address.

Detects phonetic errors in pronunciation assessment

Addresses speech variability from accents and dysfluencies

Improves phoneme recognition accuracy through similarity modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task training with phoneme similarity modeling

Verbatim phoneme recognition framework

Novel metrics for pronunciation differences

🔎 Similar Papers

No similar papers found.