Exploring the potential and limitations of Model Merging for Multi-Domain Adaptation in ASR

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost of repeatedly fine-tuning large models and the challenge of efficiently integrating them in multi-domain speech recognition. To this end, we propose BoostedTSV-M, an algorithm that mitigates rank collapse during model fusion through singular value boosting, thereby enhancing numerical stability and generalization. We conduct a systematic evaluation of 11 fusion strategies, with a focus on optimizing TSV-M–based approaches. The effectiveness of our method is validated across multilingual settings, including European Portuguese, where it outperforms full fine-tuning on the target task. Furthermore, BoostedTSV-M demonstrates strong out-of-distribution generalization and maintains robust performance in multilingual scenarios, including English.

Technology Category

Application Category

📝 Abstract
Model merging is a scalable alternative to multi-task training that combines the capabilities of multiple specialised models into a single model. This is particularly attractive for large speech foundation models, which are typically adapted through domain-specific fine-tuning, resulting in multiple customised checkpoints, for which repeating full fine-tuning when new data becomes available is computationally prohibitive. In this work, we study model merging for multi-domain ASR and benchmark 11 merging algorithms for 10 European Portuguese domains, evaluating in-domain accuracy, robustness under distribution shift, as well as English and multilingual performance. We further propose BoostedTSV-M, a new merging algorithm based on TSV-M that mitigates rank collapse via singular-value boosting and improves numerical stability. Overall, our approach outperforms full fine-tuning on European Portuguese while preserving out-of-distribution generalisation in a single model.
Problem

Research questions and friction points this paper is trying to address.

Model Merging
Multi-Domain Adaptation
Automatic Speech Recognition
Domain-Specific Fine-Tuning
Scalable Model Integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model Merging
Multi-Domain Adaptation
BoostedTSV-M
Speech Recognition
Rank Collapse
🔎 Similar Papers
No similar papers found.