Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

To address the high computational overhead, cross-lingual interference, suboptimal configuration, and poor scalability inherent in joint training for multilingual speech recognition and translation, this paper proposes LoRS-Merging—a low-rank and sparse collaborative model merging paradigm. It is the first framework to unify low-rank approximation, structured sparsity pruning, and parameter-space alignment within a single-task model merging setting. By preserving language-specific structural essentials while suppressing cross-lingual interference, LoRS-Merging enables efficient, lossless fusion of monolingual models. Evaluated on multilingual speech-to-text (S2T) benchmarks, it consistently outperforms joint-training baselines—including Whisper—achieving a 32% inference speedup and 41% memory reduction. Moreover, it supports plug-and-play language expansion without retraining.

Technology Category

Application Category

📝 Abstract

Language diversity presents a significant challenge in speech-to-text (S2T) tasks, such as automatic speech recognition and translation. Traditional multi-task training approaches aim to address this by jointly optimizing multiple speech recognition and translation tasks across various languages. While models like Whisper, built on these strategies, demonstrate strong performance, they still face issues of high computational cost, language interference, suboptimal training configurations, and limited extensibility. To overcome these challenges, we introduce LoRS-Merging (low-rank and sparse model merging), a novel technique designed to efficiently integrate models trained on different languages or tasks while preserving performance and reducing computational overhead. LoRS-Merging combines low-rank and sparse pruning to retain essential structures while eliminating redundant parameters, mitigating language and task interference, and enhancing extensibility. Experimental results across a range of languages demonstrate that LoRS-Merging significantly outperforms conventional multi-lingual multi-task training baselines. Our findings suggest that model merging, particularly LoRS-Merging, is a scalable and effective complement to traditional multi-lingual training strategies for S2T applications.

Problem

Research questions and friction points this paper is trying to address.

Reduces computational cost S2T

Mitigates language task interference

Enhances model extensibility scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRS-Merging technique introduced

Combines low-rank and sparse pruning

Reduces computational overhead significantly

🔎 Similar Papers

Cross-Lingual Transfer Learning for Speech Translation