TalTech Systems for the Interspeech 2025 ML-SUPERB 2.0 Challenge

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the Interspeech 2025 ML-SUPERB 2.0 challenge, targeting efficient zero-shot language identification (LID) and multilingual automatic speech recognition (ASR) for low-resource languages. To meet stringent requirements for zero-shot generalization and rapid language adaptation, we propose a lightweight, unified framework featuring a novel hybrid LID architecture—comprising a shared encoder and language-specialized binary language models—and dynamically integrating three complementary components: fine-tuned SeamlessM4T, MMS-1B-all language adapters, and MMS zero-shot transfer. Leveraging pretrained language embeddings and cross-lingual shared representations, our approach significantly enhances zero-shot generalization capability and fine-tuning efficiency. The system enables adaptive, language-specific deployment without architectural modification. Evaluated on the ML-SUPERB 2.0 benchmark, it achieves state-of-the-art performance, ranking first overall.

Technology Category

Application Category

📝 Abstract
This paper describes the language identification and multilingual speech recognition system developed at Tallinn University of Technology for the Interspeech 2025 ML-SUPERB 2.0 Challenge. A hybrid language identification system is used, consisting of a pretrained language embedding model and a light-weight speech recognition model with a shared encoder across languages and language-specific bigram language models. For speech recognition, three models are used, where only a single model is applied for each language, depending on the training data availability and performance on held-out data. The model set consists of a finetuned version of SeamlessM4T, MMS-1B-all with custom language adapters and MMS-zeroshot. The system obtained the top overall score in the challenge.
Problem

Research questions and friction points this paper is trying to address.

Develops hybrid language identification system for multilingual speech
Uses multiple speech recognition models tailored per language
Achieves top score in ML-SUPERB 2.0 Challenge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid language identification system
Shared encoder across languages
Custom language adapters for MMS-1B-all
🔎 Similar Papers
No similar papers found.
T
Tanel Alumae
Department of Software Science, Tallinn University of Technology, Estonia
Artem Fedorchenko
Artem Fedorchenko
Research Software Developer
Deep LearningSignal Processing