Parameter Importance-Driven Continual Learning for Foundation Models

📅 2025-11-19

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Catastrophic forgetting in domain adaptation severely degrades the general reasoning capabilities of large language and multimodal models. To address this, we propose PIECE, a parameter-importance-driven continual learning framework that preserves both general-purpose reasoning ability and domain-specific knowledge—without accessing historical data, increasing model parameters, or modifying architecture. Our key contributions are: (i) a dual importance estimation mechanism that jointly normalizes Fisher information and second-order curvature to precisely identify the top 0.1% most critical parameters for targeted fine-tuning; and (ii) a replay-free, parameter-efficient tuning (PET) strategy. Evaluated on three language models and two multimodal models, PIECE achieves state-of-the-art continual learning performance while significantly outperforming baselines in retaining general reasoning capabilities—as measured by benchmarks including MMLU and BBH.

Technology Category

Application Category

📝 Abstract

Domain-specific post-training often causes catastrophic forgetting, making foundation models lose their general reasoning ability and limiting their adaptability to dynamic real-world environments. Preserving general capabilities while acquiring downstream domain knowledge is a central challenge for large language and multimodal models. Traditional continual learning methods, such as regularization, replay and architectural isolation, suffer from poor downstream performance, reliance on inaccessible historical data, or additional parameter overhead. While recent parameter-efficient tuning (PET) methods can alleviate forgetting, their effectiveness strongly depends on the choice of parameters and update strategies. In this paper, we introduce PIECE, a Parameter Importance Estimation-based Continual Enhancement method that preserves general ability while efficiently learning domain knowledge without accessing prior training data or increasing model parameters. PIECE selectively updates only 0.1% of core parameters most relevant to new tasks, guided by two importance estimators: PIECE-F based on Fisher Information, and PIECE-S based on a second-order normalization that combines gradient and curvature information. Experiments across three language models and two multimodal models show that PIECE maintains general capabilities and achieves state-of-the-art continual learning performance across diverse downstream tasks. Our results highlight a practical path to scalable, domain-adaptive foundation models without catastrophic forgetting.

Problem

Research questions and friction points this paper is trying to address.

Addresses catastrophic forgetting in foundation models during domain-specific post-training

Preserves general reasoning ability while acquiring downstream domain knowledge

Enables continual learning without historical data access or parameter overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selectively updates 0.1% core parameters

Uses Fisher Information and second-order normalization estimators

Preserves general capabilities without accessing prior data

🔎 Similar Papers

Leveraging Hierarchical Taxonomies in Prompt-based Continual Learning