Mispronunciation Detection Without L2 Pronunciation Dataset in Low-Resource Setting: A Case Study in Finland Swedish

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of mispronunciation detection (MD) models for low-resource languages such as Finland Swedish, this paper proposes a minimal second-language (L2) data-dependent framework requiring only 89 hours of native (L1) speech and 33 minutes of unlabeled L2 read speech—no L2 pronunciation annotations are needed. The method leverages multilingual wav2vec 2.0 and incorporates entropy-regularized training, temperature scaling, and top-k normalization for post-processing, enabling language-agnostic and transferable MD modeling. Its core innovation lies in decoupling L1 knowledge distillation from L2 anomaly modeling, drastically reducing reliance on annotated L2 data. Evaluated on a Finland Swedish test set, the approach achieves 43.2% recall and 29.8% precision—improving precision by 12.2 percentage points over the baseline—while maintaining robustness and accuracy.

Technology Category

Application Category

📝 Abstract
Mispronunciation detection (MD) models are the cornerstones of many language learning applications. Unfortunately, most systems are built for English and other major languages, while low-resourced language varieties, such as Finland Swedish (FS), lack such tools. In this paper, we introduce our MD model for FS, trained on 89 hours of first language (L1) speakers' spontaneous speech and tested on 33 minutes of L2 transcribed read-aloud speech. We trained a multilingual wav2vec 2.0 model with entropy regularization, followed by temperature scaling and top-k normalization after the inference to better adapt it for MD. The main novelty of our method lies in its simplicity, requiring minimal L2 data. The process is also language-independent, making it suitable for other low-resource languages. Our proposed algorithm allows us to balance Recall (43.2%) and Precision (29.8%), compared with the baseline model's Recall (77.5%) and Precision (17.6%).
Problem

Research questions and friction points this paper is trying to address.

Detects mispronunciations in Finland Swedish without L2 data
Addresses lack of tools for low-resource language varieties
Uses minimal L2 data for language-independent MD model
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual wav2vec 2.0 with entropy regularization
Temperature scaling and top-k normalization
Minimal L2 data, language-independent approach
🔎 Similar Papers
No similar papers found.
Nhan Phan
Nhan Phan
PhD student, Aalto University
Automatic Speech RecognitionAutomatic Speaking AssessmentMDDLLM
M
M. Kuronen
Department of Language and Communication Studies, University of Jyväskylä, Finland
M
Maria Kautonen
Department of Language and Communication Studies, University of Jyväskylä, Finland
Riikka Ullakonoja
Riikka Ullakonoja
Unknown affiliation
Anna von Zansen
Anna von Zansen
Unknown affiliation
computer-assisted testingeducational technologyforeign language teaching and learninglanguage assessmentlistening compre
Yaroslav Getman
Yaroslav Getman
Aalto University
ASR
E
Ekaterina Voskoboinik
Department of Information and Communications Engineering, Aalto University, Finland
T
Tam'as Gr'osz
Department of Information and Communications Engineering, Aalto University, Finland
Mikko Kurimo
Mikko Kurimo
Professor in Speech and Language Processing, Aalto University, Finland
speech recognitionmachine learninglanguage modeling