ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

📅 2023-05-18

🏛️ Interspeech

📈 Citations: 54

✨ Influential: 6

career value

180K/year

🤖 AI Summary

Existing speech benchmarks (e.g., SUPERB) exhibit strong English bias and lack multilingual coverage. To address this, we propose ML-SUPERB—the first large-scale, highly diverse, end-to-end reproducible multilingual speech benchmark, spanning 143 languages, including low-resource and endangered varieties, with primary focus on automatic speech recognition and language identification. It adopts the “frozen self-supervised representations (wav2vec 2.0/HuBERT) + lightweight downstream models” paradigm and establishes standardized evaluation protocols and unified data preprocessing. Experiments reveal a critical insight: multilingual self-supervised learning (SSL) models do not inherently outperform monolingual counterparts. Results further demonstrate that SSL features significantly surpass FBANK baselines. The project open-sources all data, training scripts, and a live leaderboard, thereby enhancing fairness, comparability, and reproducibility in multilingual speech representation research.

📝 Abstract

Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks. However, SUPERB largely considers English speech in its evaluation. This paper presents multilingual SUPERB (ML-SUPERB), covering 143 languages (ranging from high-resource to endangered), and considering both automatic speech recognition and language identification. Following the concept of SUPERB, ML-SUPERB utilizes frozen SSL features and employs a simple framework for multilingual tasks by learning a shallow downstream model. Similar to the SUPERB benchmark, we find speech SSL models can significantly improve performance compared to FBANK features. Furthermore, we find that multilingual models do not always perform better than their monolingual counterparts. We will release ML-SUPERB as a challenge with organized datasets and reproducible training scripts for future multilingual representation research.

Problem

Research questions and friction points this paper is trying to address.

Extends SUPERB to 143 languages

Evaluates multilingual SSL model performance

Compares multilingual and monolingual model effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual speech benchmark

Frozen SSL features

Shallow downstream model

🔎 Similar Papers

No similar papers found.

Apple

Cupertino, United States of America

AI Research Scientist - Voice AI Team, Meta Superintelligence Labs