Investigating Transcription Normalization in the Faetar ASR Benchmark

📅 2025-08-15

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This study systematically investigates the impact of transcription inconsistency in the low-resource Faetar ASR benchmark. To isolate key factors, we construct a small hand-crafted lexicon and integrate it with a bigram language model under dictionary-constrained decoding. We quantitatively evaluate the effects of transcription noise, language modeling, and decoding constraints on recognition performance. Results show that transcription inconsistency is not the primary bottleneck; the conventional bigram language model yields no statistically significant improvement—challenging the widely held assumption that language models inherently benefit low-resource ASR. In contrast, lexicon-based constrained decoding substantially improves word error rate. This work is the first to demonstrate that lightweight decoding constraints can effectively substitute for complex language modeling in extremely low-resource settings, offering a novel paradigm for resource-constrained dialectal ASR. Despite these advances, the overall task remains highly challenging.

Technology Category

Application Category

📝 Abstract

We examine the role of transcription inconsistencies in the Faetar Automatic Speech Recognition benchmark, a challenging low-resource ASR benchmark. With the help of a small, hand-constructed lexicon, we conclude that find that, while inconsistencies do exist in the transcriptions, they are not the main challenge in the task. We also demonstrate that bigram word-based language modelling is of no added benefit, but that constraining decoding to a finite lexicon can be beneficial. The task remains extremely difficult.

Problem

Research questions and friction points this paper is trying to address.

Investigating transcription normalization in Faetar ASR benchmark

Assessing impact of transcription inconsistencies on low-resource ASR

Evaluating language modeling and lexicon constraints for decoding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hand-constructed lexicon analysis

Finite lexicon constrained decoding

Bigram language model evaluation

🔎 Similar Papers

No similar papers found.