๐ค AI Summary
Multilingual automatic speech recognition (ASR) and language identification (LID) face performance bottlenecks due to high acoustic diversity across languages and severe data scarcity for low-resource languages. Method: We introduce ML-SUPERB, a comprehensive multilingual speech benchmark covering 154 languages, structured into three tracksโresearch, model submission, and new-language integration. It integrates self-supervised models (e.g., wav2vec 2.0, XLS-R) with multitask fine-tuning, language-adaptive alignment, and low-resource data augmentation. Contribution/Results: ML-SUPERB is the largest such benchmark to date, incorporating 12 submitted models and corpora from 54 languages. Its systematic evaluation demonstrates that scaling model size alone does not improve multilingual performance; instead, acoustic diversity emerges as the primary bottleneck for cross-lingual generalization. Furthermore, we propose a standardized framework for integrating new languages, significantly advancing low-resource speech technology development.
๐ Abstract
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises a research track focused on applying ML-SUPERB to specific multilingual subjects, a Challenge Track for model submissions, and a New Language Track where language resource researchers can contribute and evaluate their low-resource language data in the context of the latest progress in multilingual speech recognition. The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages. The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks, and a variety of speech/voice types present significant challenges in multilingual speech processing.