🤖 AI Summary
Catastrophic forgetting in domain adaptation severely degrades the general reasoning capabilities of large language and multimodal models. To address this, we propose PIECE, a parameter-importance-driven continual learning framework that preserves both general-purpose reasoning ability and domain-specific knowledge—without accessing historical data, increasing model parameters, or modifying architecture. Our key contributions are: (i) a dual importance estimation mechanism that jointly normalizes Fisher information and second-order curvature to precisely identify the top 0.1% most critical parameters for targeted fine-tuning; and (ii) a replay-free, parameter-efficient tuning (PET) strategy. Evaluated on three language models and two multimodal models, PIECE achieves state-of-the-art continual learning performance while significantly outperforming baselines in retaining general reasoning capabilities—as measured by benchmarks including MMLU and BBH.
📝 Abstract
Domain-specific post-training often causes catastrophic forgetting, making foundation models lose their general reasoning ability and limiting their adaptability to dynamic real-world environments. Preserving general capabilities while acquiring downstream domain knowledge is a central challenge for large language and multimodal models. Traditional continual learning methods, such as regularization, replay and architectural isolation, suffer from poor downstream performance, reliance on inaccessible historical data, or additional parameter overhead. While recent parameter-efficient tuning (PET) methods can alleviate forgetting, their effectiveness strongly depends on the choice of parameters and update strategies. In this paper, we introduce PIECE, a Parameter Importance Estimation-based Continual Enhancement method that preserves general ability while efficiently learning domain knowledge without accessing prior training data or increasing model parameters. PIECE selectively updates only 0.1% of core parameters most relevant to new tasks, guided by two importance estimators: PIECE-F based on Fisher Information, and PIECE-S based on a second-order normalization that combines gradient and curvature information. Experiments across three language models and two multimodal models show that PIECE maintains general capabilities and achieves state-of-the-art continual learning performance across diverse downstream tasks. Our results highlight a practical path to scalable, domain-adaptive foundation models without catastrophic forgetting.