Understanding Textual Capability Degradation in Speech LLMs via Parameter Importance Analysis

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Speech large models (SLMs) commonly suffer from degraded text reasoning capabilities upon integrating speech functionality, hindering effective utilization of pre-trained textual knowledge. Method: We propose the first layer-wise analytical framework grounded in parameter importance estimation to uncover the intrinsic mechanism—speech fine-tuning induces distributional shifts in text-critical parameters. Building on this insight, we design an optimization strategy that jointly preserves textual competence and adapts to speech modalities, integrating parameter-importance-weighted layer-adaptive learning rate scheduling with LoRA-based fine-tuning. Contribution/Results: Experiments demonstrate that our method significantly outperforms full-parameter fine-tuning on speech tasks—including automatic speech recognition (ASR) and speech-based question answering—while fully retaining the original LLM’s text reasoning performance. This work provides an interpretable, scalable theoretical foundation and practical methodology for capability synergy in multimodal large language models.

Technology Category

Application Category

📝 Abstract
The integration of speech into Large Language Models (LLMs) has substantially expanded their capabilities, but often at the cost of weakening their core textual competence. This degradation limits the ability of speech-enabled LLMs to fully exploit their pre-trained text-based knowledge. In this work, we analyze the underlying mechanisms of this issue through a focused study of the widely used encoder-adaptor paradigm. We propose an analytical framework based on parameter importance estimation, which reveals that fine-tuning for speech introduces a textual importance distribution shift: the layer-wise allocation of parameters critical to textual reasoning is disrupted. Building on this insight, we investigate two mitigation strategies: layer-wise learning rate scheduling and Low-Rank Adaptation (LoRA), both aim to preserve the original parameter distribution. Experimental results show that both approaches better maintain textual competence than full fine-tuning, while also improving downstream spoken question answering performance. Furthermore, our analysis offers a principled explanation for the effectiveness of the proposed mitigation strategies, linking their benefits to the structural properties of textual knowledge in LLMs.
Problem

Research questions and friction points this paper is trying to address.

Analyzing textual capability degradation in speech-enabled large language models
Investigating parameter importance distribution shift during speech fine-tuning
Developing mitigation strategies to preserve textual reasoning competence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter importance estimation analyzes textual degradation
Layer-wise learning rate scheduling preserves parameter distribution
Low-Rank Adaptation maintains textual competence during fine-tuning
🔎 Similar Papers
No similar papers found.
C
Chao Wang
National Engineering Research Center of Speech and Language Information Processing, University of Science and Technology of China, Hefei, P. R. China
R
Rui-Chen Zheng
National Engineering Research Center of Speech and Language Information Processing, University of Science and Technology of China, Hefei, P. R. China
Yang Ai
Yang Ai
Associate Researcher, University of Science and Technology of China
Speech SynthesisSpeech EnhancementSpeech CodingDeep Learning
Z
Zhen-Hua Ling
National Engineering Research Center of Speech and Language Information Processing, University of Science and Technology of China, Hefei, P. R. China