Towards Accurate Phonetic Error Detection Through Phoneme Similarity Modeling

📅 2025-07-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current speech recognition models exhibit limited capability in phoneme-level modeling of pronunciation deviations—such as accents and disfluencies—thereby constraining the accuracy of automatic pronunciation assessment. To address this, we propose an end-to-end approach integrating multi-task learning with explicit phoneme similarity modeling, enabling fine-grained characterization of discrepancies between actual and canonical pronunciations. We construct and publicly release VCTK-accent, the first synthetic dataset specifically designed for pronunciation error modeling. Additionally, we introduce two novel metrics for quantifying pronunciation divergence. Experiments demonstrate that our method significantly improves phoneme-level transcription accuracy, particularly under non-native or atypical pronunciation conditions, and exhibits enhanced robustness compared to prior approaches. Our work establishes a new benchmark for pronunciation error detection and advances the state of the art in automatic pronunciation assessment.

Technology Category

Application Category

📝 Abstract
Phonetic error detection, a core subtask of automatic pronunciation assessment, identifies pronunciation deviations at the phoneme level. Speech variability from accents and dysfluencies challenges accurate phoneme recognition, with current models failing to capture these discrepancies effectively. We propose a verbatim phoneme recognition framework using multi-task training with novel phoneme similarity modeling that transcribes what speakers actually say rather than what they're supposed to say. We develop and open-source extit{VCTK-accent}, a simulated dataset containing phonetic errors, and propose two novel metrics for assessing pronunciation differences. Our work establishes a new benchmark for phonetic error detection.
Problem

Research questions and friction points this paper is trying to address.

Detects phonetic errors in pronunciation assessment
Addresses speech variability from accents and dysfluencies
Improves phoneme recognition accuracy through similarity modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task training with phoneme similarity modeling
Verbatim phoneme recognition framework
Novel metrics for pronunciation differences
🔎 Similar Papers
No similar papers found.
Xuanru Zhou
Xuanru Zhou
Zhejiang University
Speech ProcessingMultimodalRepresentation Learning
Jiachen Lian
Jiachen Lian
UC Berkeley
precision healthcarespeech processingmachine learning
Cheol Jun Cho
Cheol Jun Cho
UC Berkeley, EECS
AIMachine LearningSpeech ProcessingNeuroscienceBrain-Computer Interfaces
T
Tejas Prabhune
UC Berkeley, United States
S
Shuhe Li
Zhejiang University, China
W
William Li
UC Berkeley, United States
R
Rodrigo Ortiz
UC Berkeley, United States
Zoe Ezzes
Zoe Ezzes
Research Speech-Language Pathologist, University of California, San Francisco
languagecognitionaphasianeurogenic communication disorders
J
Jet Vonk
UCSF, United States
B
Brittany Morin
UCSF, United States
R
Rian Bogley
UCSF, United States
L
Lisa Wauters
UCSF, United States
Zachary Miller
Zachary Miller
Associate Professor of Neurology, UCSF Memory and Aging Center
Behavioral NeurologyDementiaNeurodevelopmentImmunology
M
Maria Gorno-Tempini
UCSF, United States
G
Gopala Anumanchipalli
UC Berkeley, United States