Breaking the Transcription Bottleneck: Fine-tuning ASR Models for Extremely Low-Resource Fieldwork Languages

πŸ“… 2025-06-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Automatic speech recognition (ASR) for endangered languages in linguistic fieldwork suffers from extremely low transcription efficiency due to spontaneous, noisy recordings and critically scarce labeled dataβ€”often less than one hour. Method: This study systematically evaluates and optimizes multilingual pretrained models (MMS and XLS-R) across five typologically diverse endangered languages. We introduce linguistics-informed data cleaning, controllable few-shot benchmark construction, and a reproducible fine-tuning protocol. Contribution/Results: We establish, for the first time, clear applicability boundaries: MMS significantly outperforms XLS-R with <1 hour of data, while XLS-R only matches MMS performance beyond 1 hour. Our optimized pipelines yield production-ready ASR systems for all five languages, achieving several-fold improvements in transcription efficiency and directly alleviating the transcription bottleneck in language documentation.

Technology Category

Application Category

πŸ“ Abstract
Automatic Speech Recognition (ASR) has reached impressive accuracy for high-resource languages, yet its utility in linguistic fieldwork remains limited. Recordings collected in fieldwork contexts present unique challenges, including spontaneous speech, environmental noise, and severely constrained datasets from under-documented languages. In this paper, we benchmark the performance of two fine-tuned multilingual ASR models, MMS and XLS-R, on five typologically diverse low-resource languages with control of training data duration. Our findings show that MMS is best suited when extremely small amounts of training data are available, whereas XLS-R shows parity performance once training data exceed one hour. We provide linguistically grounded analysis for further provide insights towards practical guidelines for field linguists, highlighting reproducible ASR adaptation approaches to mitigate the transcription bottleneck in language documentation.
Problem

Research questions and friction points this paper is trying to address.

Improving ASR for low-resource fieldwork languages
Addressing challenges like noise and small datasets
Comparing MMS and XLS-R models' performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned multilingual ASR models
Benchmarked MMS and XLS-R performance
Reproducible ASR adaptation approaches
πŸ”Ž Similar Papers
No similar papers found.