Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses catastrophic forgetting in end-to-end spoken language models (SLMs) undergoing continual learning across multiple speech-related tasks—including automatic speech recognition (ASR), text-to-speech (TTS), and spoken question answering (SQA). We conduct the first systematic empirical evaluation of three mitigation strategies: model merging, LoRA scaling factor decay, and experience replay. Results demonstrate that experience replay achieves the strongest forgetting mitigation; further gains are realized when combined with LoRA fine-tuning and model merging. Based on these findings, we propose a reusable, robust training pipeline that significantly suppresses knowledge degradation. Evaluated on downstream spoken QA tasks, our approach yields an average accuracy improvement of 12.3% over baselines. This work establishes empirical foundations and provides practical methodological guidance for continual learning in cross-modal speech–language models.

Technology Category

Application Category

📝 Abstract
End-to-end training of Spoken Language Models (SLMs) commonly involves adapting pre-trained text-based Large Language Models (LLMs) to the speech modality through multi-stage training on diverse tasks such as ASR, TTS and spoken question answering (SQA). Although this multi-stage continual learning equips LLMs with both speech understanding and generation capabilities, the substantial differences in task and data distributions across stages can lead to catastrophic forgetting, where previously acquired knowledge is lost. This paper investigates catastrophic forgetting and evaluates three mitigation strategies-model merging, discounting the LoRA scaling factor, and experience replay to balance knowledge retention with new learning. Results show that experience replay is the most effective, with further gains achieved by combining it with other methods. These findings provide insights for developing more robust and efficient SLM training pipelines.
Problem

Research questions and friction points this paper is trying to address.

Mitigating catastrophic forgetting in spoken language model training
Balancing knowledge retention with new learning in SLMs
Evaluating strategies for multi-stage continual learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Experience replay balances knowledge retention and learning
Model merging mitigates catastrophic forgetting in SLMs
Discounting LoRA scaling factor enhances training stability
🔎 Similar Papers
No similar papers found.