Mispronunciation Detection Without L2 Pronunciation Dataset in Low-Resource Setting: A Case Study in Finland Swedish

📅 2025-06-01

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the scarcity of mispronunciation detection (MD) models for low-resource languages such as Finland Swedish, this paper proposes a minimal second-language (L2) data-dependent framework requiring only 89 hours of native (L1) speech and 33 minutes of unlabeled L2 read speech—no L2 pronunciation annotations are needed. The method leverages multilingual wav2vec 2.0 and incorporates entropy-regularized training, temperature scaling, and top-k normalization for post-processing, enabling language-agnostic and transferable MD modeling. Its core innovation lies in decoupling L1 knowledge distillation from L2 anomaly modeling, drastically reducing reliance on annotated L2 data. Evaluated on a Finland Swedish test set, the approach achieves 43.2% recall and 29.8% precision—improving precision by 12.2 percentage points over the baseline—while maintaining robustness and accuracy.

Technology Category

Application Category

📝 Abstract

Mispronunciation detection (MD) models are the cornerstones of many language learning applications. Unfortunately, most systems are built for English and other major languages, while low-resourced language varieties, such as Finland Swedish (FS), lack such tools. In this paper, we introduce our MD model for FS, trained on 89 hours of first language (L1) speakers' spontaneous speech and tested on 33 minutes of L2 transcribed read-aloud speech. We trained a multilingual wav2vec 2.0 model with entropy regularization, followed by temperature scaling and top-k normalization after the inference to better adapt it for MD. The main novelty of our method lies in its simplicity, requiring minimal L2 data. The process is also language-independent, making it suitable for other low-resource languages. Our proposed algorithm allows us to balance Recall (43.2%) and Precision (29.8%), compared with the baseline model's Recall (77.5%) and Precision (17.6%).

Problem

Research questions and friction points this paper is trying to address.

Detects mispronunciations in Finland Swedish without L2 data

Addresses lack of tools for low-resource language varieties

Uses minimal L2 data for language-independent MD model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual wav2vec 2.0 with entropy regularization

Temperature scaling and top-k normalization

Minimal L2 data, language-independent approach

🔎 Similar Papers

Towards Unsupervised Speech Recognition Without Pronunciation Models