AIMMerging: Adaptive Iterative Model Merging Using Training Trajectories for Language Model Continual Learning

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the inherent trade-off between knowledge acquisition and catastrophic forgetting in continual learning of large language models (LLMs) under dynamic environments, this paper proposes Adaptive Iterative Model Fusion (AIMF). Methodologically, AIMF introduces two key innovations: (1) a differentiable merging controller that dynamically monitors learning–forgetting signals along training trajectories to adaptively determine optimal merging timing and frequency; and (2) a knowledge-selective fusion module that jointly integrates replay mechanisms with weighted model fusion. Crucially, AIMF eliminates the need for manually predefined merging policies. Evaluated across multiple model scales and three major continual learning benchmarks, AIMF consistently outperforms state-of-the-art methods, achieving average relative improvements of 80% in forward transfer (FWT) and 59% in backward transfer (BWT). These results demonstrate significantly enhanced robustness and adaptability of LLMs in dynamic, non-stationary environments.

Technology Category

Application Category

📝 Abstract
Continual learning (CL) is essential for deploying large language models (LLMs) in dynamic real-world environments without the need for costly retraining. Recent model merging-based methods have attracted significant attention, but they still struggle to effectively manage the trade-off between learning new knowledge and preventing forgetting, a challenge largely stemming from suboptimal number of merges and merging frequency. In this paper, we introduce Adaptive Iterative Model Merging (AimMerging), a novel CL framework that utilizes learning and forgetting signals from the training trajectory to dynamically monitor the model's training status. Guided by dynamic monitoring, the training trajectory-guided merge controller adaptively determines the timing and frequency of iterative fusion, while the rehearsal-based knowledge fusion module computes the merging weights and executes the fusion. Comprehensive experiments on three CL benchmarks with various model sizes (from 770M to 13B) demonstrate that AimMerging achieves significant performance improvements over existing state-of-the-art methods, with an average relative improvement of 80% and 59% on FWT and BWT, respectively. The source code is provided for reproducibility.
Problem

Research questions and friction points this paper is trying to address.

Balancing new knowledge acquisition with forgetting prevention in continual learning
Determining optimal merge timing and frequency for model fusion
Adaptively monitoring training status using learning and forgetting signals
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive merging using training trajectory signals
Dynamic monitoring determines merge timing frequency
Rehearsal-based fusion module computes merging weights
🔎 Similar Papers
No similar papers found.
Y
Yujie Feng
Al Technology Center of OVB, Tencent, China
J
Jian Li
Al Technology Center of OVB, Tencent, China
X
Xiaoyu Dong
The Hong Kong Polytechnic University, Hong Kong S.A.R.
P
Pengfei Xu
Al Technology Center of OVB, Tencent, China
X
Xiaohui Zhou
Al Technology Center of OVB, Tencent, China
Y
Yujia Zhang
Al Technology Center of OVB, Tencent, China
Zexin Lu
Zexin Lu
Sichuan University
Y
Yasha Wang
Peking University, China
A
Alan Zhao
Al Technology Center of OVB, Tencent, China
X
Xu Chu
Peking University, China
X
Xiao-Ming Wu
The Hong Kong Polytechnic University, Hong Kong S.A.R.