Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Over 7,000 languages worldwide lack automatic speech recognition (ASR) support—especially low-resource, long-tail languages—due to limitations in architectural scalability, prohibitive data acquisition costs, and ethical risks. Method: We propose the first scalable, zero-shot multilingual ASR architecture, built upon a 7B-parameter self-supervised encoder-decoder framework, augmented with a large language model–inspired decoder that enables robust cross-lingual generalization from minimal speech data. Contribution/Results: The system supports 1,600+ languages—including over 500 previously unsupported—using publicly available data and community-driven multilingual speech corpora. Experiments demonstrate substantial gains over state-of-the-art methods in ultra-low-resource settings. We open-source a family of models ranging from 300M to 7B parameters, optimized for both edge-device deployment and high-accuracy applications.

Technology Category

Application Category

📝 Abstract
Automatic speech recognition (ASR) has advanced in high-resource languages, but most of the world's 7,000+ languages remain unsupported, leaving thousands of long-tail languages behind. Expanding ASR coverage has been costly and limited by architectures that restrict language support, making extension inaccessible to most--all while entangled with ethical concerns when pursued without community collaboration. To transcend these limitations, we introduce Omnilingual ASR, the first large-scale ASR system designed for extensibility. Omnilingual ASR enables communities to introduce unserved languages with only a handful of data samples. It scales self-supervised pre-training to 7B parameters to learn robust speech representations and introduces an encoder-decoder architecture designed for zero-shot generalization, leveraging a LLM-inspired decoder. This capability is grounded in a massive and diverse training corpus; by combining breadth of coverage with linguistic variety, the model learns representations robust enough to adapt to unseen languages. Incorporating public resources with community-sourced recordings gathered through compensated local partnerships, Omnilingual ASR expands coverage to over 1,600 languages, the largest such effort to date--including over 500 never before served by ASR. Automatic evaluations show substantial gains over prior systems, especially in low-resource conditions, and strong generalization. We release Omnilingual ASR as a family of models, from 300M variants for low-power devices to 7B for maximum accuracy. We reflect on the ethical considerations shaping this design and conclude by discussing its societal impact. In particular, we highlight how open-sourcing models and tools can lower barriers for researchers and communities, inviting new forms of participation. Open-source artifacts are available at https://github.com/facebookresearch/omnilingual-asr.
Problem

Research questions and friction points this paper is trying to address.

Developing ASR for unsupported long-tail languages with limited data
Overcoming architectural limitations restricting multilingual ASR scalability
Addressing ethical concerns in language technology development through collaboration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised pre-training scales to 7B parameters
Encoder-decoder architecture enables zero-shot generalization
Leverages LLM-inspired decoder for robust speech representations
🔎 Similar Papers
No similar papers found.
Gil Keren
Gil Keren
FAIR at Meta
A
Artyom Kozhevnikov
FAIR at Meta
Y
Yen Meng
FAIR at Meta
C
C. Ropers
FAIR at Meta
M
Matthew Setzler
FAIR at Meta
Skyler Wang
Skyler Wang
Assistant Professor, McGill University | Research Scientist, Meta
AITechnologyHuman-Computer InteractionEconomic SociologyGender & Sexuality
I
Ifeoluwanimi Adebara
Department of Sociology, McGill University
M
Michael Auli
FAIR at Meta
C
Can Balioglu
FAIR at Meta
K
Kevin Chan
FAIR at Meta
C
Chierh Cheng
FAIR at Meta
J
Joe Chuang
FAIR at Meta
C
Caley Droof
FAIR at Meta
M
Mark Duppenthaler
FAIR at Meta
Paul-Ambroise Duquenne
Paul-Ambroise Duquenne
Meta AI, FAIR
NLPSpeech ProcessingSpeech TranslationMachine TranslationMachine Learning
Alexander Erben
Alexander Erben
FAIR at Meta
C
Cynthia Gao
FAIR at Meta
G
Gabriel Mejia Gonzalez
FAIR at Meta
K
Kehan Lyu
FAIR at Meta
S
Sagar Miglani
FAIR at Meta
Vineel Pratap
Vineel Pratap
WaveForms AI
Speech RecognitionNLPMachine Learning
K
Kaushik Ram Sadagopan
FAIR at Meta
S
Safiyyah Saleem
FAIR at Meta
A
Arina Turkatenko
FAIR at Meta
A
Albert Ventayol-boada
FAIR at Meta
Zheng-Xin Yong
Zheng-Xin Yong
Brown University
Machine Learning
Yu-An Chung
Yu-An Chung
Facebook AI Research (FAIR)
Machine LearningSpeech ProcessingNatural Language Processing
Jean Maillard
Jean Maillard
Meta AI
Natural Language ProcessingComputational LinguisticsMachine LearningDeep Learning
R
Rashel Moritz
FAIR at Meta
A
Alex Mourachko
FAIR at Meta
M
Mary Williamson
FAIR at Meta
S
Shireen Yates
FAIR at Meta