Parameter Importance-Driven Continual Learning for Foundation Models

📅 2025-11-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Catastrophic forgetting in domain adaptation severely degrades the general reasoning capabilities of large language and multimodal models. To address this, we propose PIECE, a parameter-importance-driven continual learning framework that preserves both general-purpose reasoning ability and domain-specific knowledge—without accessing historical data, increasing model parameters, or modifying architecture. Our key contributions are: (i) a dual importance estimation mechanism that jointly normalizes Fisher information and second-order curvature to precisely identify the top 0.1% most critical parameters for targeted fine-tuning; and (ii) a replay-free, parameter-efficient tuning (PET) strategy. Evaluated on three language models and two multimodal models, PIECE achieves state-of-the-art continual learning performance while significantly outperforming baselines in retaining general reasoning capabilities—as measured by benchmarks including MMLU and BBH.

Technology Category

Application Category

📝 Abstract
Domain-specific post-training often causes catastrophic forgetting, making foundation models lose their general reasoning ability and limiting their adaptability to dynamic real-world environments. Preserving general capabilities while acquiring downstream domain knowledge is a central challenge for large language and multimodal models. Traditional continual learning methods, such as regularization, replay and architectural isolation, suffer from poor downstream performance, reliance on inaccessible historical data, or additional parameter overhead. While recent parameter-efficient tuning (PET) methods can alleviate forgetting, their effectiveness strongly depends on the choice of parameters and update strategies. In this paper, we introduce PIECE, a Parameter Importance Estimation-based Continual Enhancement method that preserves general ability while efficiently learning domain knowledge without accessing prior training data or increasing model parameters. PIECE selectively updates only 0.1% of core parameters most relevant to new tasks, guided by two importance estimators: PIECE-F based on Fisher Information, and PIECE-S based on a second-order normalization that combines gradient and curvature information. Experiments across three language models and two multimodal models show that PIECE maintains general capabilities and achieves state-of-the-art continual learning performance across diverse downstream tasks. Our results highlight a practical path to scalable, domain-adaptive foundation models without catastrophic forgetting.
Problem

Research questions and friction points this paper is trying to address.

Addresses catastrophic forgetting in foundation models during domain-specific post-training
Preserves general reasoning ability while acquiring downstream domain knowledge
Enables continual learning without historical data access or parameter overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selectively updates 0.1% core parameters
Uses Fisher Information and second-order normalization estimators
Preserves general capabilities without accessing prior data
🔎 Similar Papers
No similar papers found.
Lingxiang Wang
Lingxiang Wang
Beihang university
NLP
Hainan Zhang
Hainan Zhang
Beihang University
Dialogue GenerationText GenerationFederated LearningNatural Language Processing
Z
Zhiming Zheng
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University; School of Artificial Intelligence, Beihang University