The Eloquence team submission for task 1 of MLC-SLM challenge

📅 2025-07-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited robustness of automatic speech recognition (ASR) in multilingual spoken dialogue scenarios. To enhance cross-lingual generalization and conversational coherence modeling, we propose a novel speech-language model architecture. Methodologically: (1) a multilingual linear projector aligns speech representations across languages; (2) a QFormer module is integrated into the SLAM-ASR framework to explicitly model extended dialogue context; and (3) a contrastive learning objective strengthens speech–text alignment and cross-lingual transferability. Experiments on a multilingual conversational ASR benchmark demonstrate significant improvements over official baselines in recognition accuracy. Ablation studies confirm the complementary benefits of the projector design, context-aware modeling, and contrastive learning. The proposed approach establishes a reproducible technical pathway toward robust, multilingual dialogue ASR systems.

Technology Category

Application Category

📝 Abstract
In this paper, we present our studies and experiments carried out for the task 1 of the Challenge and Workshop on Multilingual Conversational Speech Language Model (MLC-SLM), which focuses on advancing multilingual conversational speech recognition through the development of speech language models architectures. Given the increasing relevance of real-world conversational data for building robust Spoken Dialogue Systems, we explore three approaches to multilingual ASR. First, we conduct an evaluation of the official baseline to better understand its strengths and limitations, by training two projectors (linear and qformer) with different foundation models. Second we leverage the SLAM-ASR framework to train a custom multilingual linear projector. Finally we investigate the role of contrastive learning and the extended conversational context in enhancing the robustness of recognition.
Problem

Research questions and friction points this paper is trying to address.

Advance multilingual conversational speech recognition models
Evaluate baseline strengths and limitations using different projectors
Enhance recognition robustness with contrastive learning and context
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated baseline with linear and qformer projectors
Trained custom multilingual linear projector
Explored contrastive learning for robust recognition
🔎 Similar Papers
No similar papers found.