🤖 AI Summary
Addressing the inherent trade-off between knowledge acquisition and catastrophic forgetting in continual learning of large language models (LLMs) under dynamic environments, this paper proposes Adaptive Iterative Model Fusion (AIMF). Methodologically, AIMF introduces two key innovations: (1) a differentiable merging controller that dynamically monitors learning–forgetting signals along training trajectories to adaptively determine optimal merging timing and frequency; and (2) a knowledge-selective fusion module that jointly integrates replay mechanisms with weighted model fusion. Crucially, AIMF eliminates the need for manually predefined merging policies. Evaluated across multiple model scales and three major continual learning benchmarks, AIMF consistently outperforms state-of-the-art methods, achieving average relative improvements of 80% in forward transfer (FWT) and 59% in backward transfer (BWT). These results demonstrate significantly enhanced robustness and adaptability of LLMs in dynamic, non-stationary environments.
📝 Abstract
Continual learning (CL) is essential for deploying large language models (LLMs) in dynamic real-world environments without the need for costly retraining. Recent model merging-based methods have attracted significant attention, but they still struggle to effectively manage the trade-off between learning new knowledge and preventing forgetting, a challenge largely stemming from suboptimal number of merges and merging frequency. In this paper, we introduce Adaptive Iterative Model Merging (AimMerging), a novel CL framework that utilizes learning and forgetting signals from the training trajectory to dynamically monitor the model's training status. Guided by dynamic monitoring, the training trajectory-guided merge controller adaptively determines the timing and frequency of iterative fusion, while the rehearsal-based knowledge fusion module computes the merging weights and executes the fusion. Comprehensive experiments on three CL benchmarks with various model sizes (from 770M to 13B) demonstrate that AimMerging achieves significant performance improvements over existing state-of-the-art methods, with an average relative improvement of 80% and 59% on FWT and BWT, respectively. The source code is provided for reproducibility.