LoASR-Bench: Evaluating Large Speech Language Models on Low-Resource Automatic Speech Recognition Across Language Families

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This study addresses the lack of systematic evaluation of speech language models in low-resource and cross-linguistic settings for automatic speech recognition (ASR), which hinders their deployment in real-world multilingual environments. To bridge this gap, the authors introduce the first low-resource ASR benchmark encompassing 25 languages across nine language families, including both Latin and non-Latin scripts. The benchmark integrates multilingual speech datasets, cross-linguistic linguistic feature analysis, and a unified model evaluation framework. Experimental results reveal that state-of-the-art speech language models exhibit significant performance degradation on low-resource languages and limited cross-lingual generalization capabilities. This benchmark fills a critical void in evaluating ASR systems under realistic low-resource multilingual conditions and provides essential infrastructure for future model development and assessment.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have driven substantial advances in speech language models (SpeechLMs), yielding strong performance in automatic speech recognition (ASR) under high-resource conditions. However, existing benchmarks predominantly focus on high-resource languages, leaving the ASR behavior of SpeechLMs in low-resource languages insufficiently understood. This gap is critical, as practical ASR systems must reliably support low-resource languages and generalize across diverse language families, and it directly hinders the deployment of SpeechLM-based ASR in real-world multilingual scenarios. As a result, it is essential to evaluate SpeechLMs on low-resource languages to ensure their generalizability across different language families. To address this problem, we propose \textbf{LoASR-Bench}, a comprehensive benchmark designed to evaluate \textbf{lo}w-resource \textbf{a}utomatic \textbf{s}peech \textbf{r}ecognition (\textbf{ASR}) of the latest SpeechLMs across diverse language families. LoASR-Bench comprises 25 languages from 9 language families, featuring both Latin and non-Latin scripts, enabling cross-linguistic and cross-script assessment of ASR performance of current SpeechLMs. Experimental results highlight the limitations of the latest SpeechLMs in handling real-world low-resource languages.

Problem

Research questions and friction points this paper is trying to address.

low-resource

automatic speech recognition

speech language models

language families

multilingual

Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-resource ASR

Speech Language Models

Cross-lingual Evaluation