🤖 AI Summary
This study systematically investigates the impact of transcription inconsistency in the low-resource Faetar ASR benchmark. To isolate key factors, we construct a small hand-crafted lexicon and integrate it with a bigram language model under dictionary-constrained decoding. We quantitatively evaluate the effects of transcription noise, language modeling, and decoding constraints on recognition performance. Results show that transcription inconsistency is not the primary bottleneck; the conventional bigram language model yields no statistically significant improvement—challenging the widely held assumption that language models inherently benefit low-resource ASR. In contrast, lexicon-based constrained decoding substantially improves word error rate. This work is the first to demonstrate that lightweight decoding constraints can effectively substitute for complex language modeling in extremely low-resource settings, offering a novel paradigm for resource-constrained dialectal ASR. Despite these advances, the overall task remains highly challenging.
📝 Abstract
We examine the role of transcription inconsistencies in the Faetar Automatic Speech Recognition benchmark, a challenging low-resource ASR benchmark. With the help of a small, hand-constructed lexicon, we conclude that find that, while inconsistencies do exist in the transcriptions, they are not the main challenge in the task. We also demonstrate that bigram word-based language modelling is of no added benefit, but that constraining decoding to a finite lexicon can be beneficial. The task remains extremely difficult.