AIMMerging: Adaptive Iterative Model Merging Using Training Trajectories for Language Model Continual Learning

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Addressing the inherent trade-off between knowledge acquisition and catastrophic forgetting in continual learning of large language models (LLMs) under dynamic environments, this paper proposes Adaptive Iterative Model Fusion (AIMF). Methodologically, AIMF introduces two key innovations: (1) a differentiable merging controller that dynamically monitors learning–forgetting signals along training trajectories to adaptively determine optimal merging timing and frequency; and (2) a knowledge-selective fusion module that jointly integrates replay mechanisms with weighted model fusion. Crucially, AIMF eliminates the need for manually predefined merging policies. Evaluated across multiple model scales and three major continual learning benchmarks, AIMF consistently outperforms state-of-the-art methods, achieving average relative improvements of 80% in forward transfer (FWT) and 59% in backward transfer (BWT). These results demonstrate significantly enhanced robustness and adaptability of LLMs in dynamic, non-stationary environments.

Technology Category

Application Category

📝 Abstract

Continual learning (CL) is essential for deploying large language models (LLMs) in dynamic real-world environments without the need for costly retraining. Recent model merging-based methods have attracted significant attention, but they still struggle to effectively manage the trade-off between learning new knowledge and preventing forgetting, a challenge largely stemming from suboptimal number of merges and merging frequency. In this paper, we introduce Adaptive Iterative Model Merging (AimMerging), a novel CL framework that utilizes learning and forgetting signals from the training trajectory to dynamically monitor the model's training status. Guided by dynamic monitoring, the training trajectory-guided merge controller adaptively determines the timing and frequency of iterative fusion, while the rehearsal-based knowledge fusion module computes the merging weights and executes the fusion. Comprehensive experiments on three CL benchmarks with various model sizes (from 770M to 13B) demonstrate that AimMerging achieves significant performance improvements over existing state-of-the-art methods, with an average relative improvement of 80% and 59% on FWT and BWT, respectively. The source code is provided for reproducibility.

Problem

Research questions and friction points this paper is trying to address.

Balancing new knowledge acquisition with forgetting prevention in continual learning

Determining optimal merge timing and frequency for model fusion

Adaptively monitoring training status using learning and forgetting signals

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive merging using training trajectory signals

Dynamic monitoring determines merge timing frequency

Rehearsal-based fusion module computes merging weights

🔎 Similar Papers

No similar papers found.