ADEPT: Continual Pretraining via Adaptive Expansion and Dynamic Decoupled Tuning

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address catastrophic forgetting and limited domain capacity in large language models (LLMs) during continual pretraining (CPT), this paper proposes an Adaptive Expansion and Dynamic Decoupled Tuning framework. Methodologically, it introduces a novel functionality-aware hierarchical selective expansion mechanism, integrated with unit-level importance-aware decoupled optimization and asymmetric learning rate scheduling, enabling synergistic modeling of general capability retention and domain-specific knowledge injection. Its key innovation lies in functionally decoupling parameter expansion from parameter updating—thereby eliminating their entanglement. Experiments demonstrate that tuning only 15% of parameters reduces training time by over 50%, while outperforming full-parameter fine-tuning on mathematical and medical benchmarks: general capability improves by 5.76% and domain-specific performance by 5.58%.

Technology Category

Application Category

📝 Abstract
Conventional continual pretraining (CPT) for large language model (LLM) domain adaptation often suffers from catastrophic forgetting and limited domain capacity. Existing strategies adopt layer expansion, introducing additional trainable parameters to accommodate new knowledge. However, the uniform expansion and updates still entangle general and domain learning, undermining its effectiveness. Our pilot studies reveal that LLMs exhibit functional specialization, where layers and units differentially encode general-critical capabilities, suggesting that parameter expansion and optimization should be function-aware. We then propose ADEPT, Adaptive Expansion and Dynamic Decoupled Tuning for continual pretraining, a two-stage framework for domain-adaptive CPT. ADEPT first performs General-Competence Guided Selective Layer Expansion, duplicating layers least critical for the general domain to increase representational capacity while minimizing interference with general knowledge. It then applies Adaptive Unit-Wise Decoupled Tuning, disentangling parameter units within expanded layers according to their general-domain importance and assigning asymmetric learning rates to balance knowledge injection and retention. Experiments on mathematical and medical benchmarks show that ADEPT outperforms full-parameter CPT by up to 5.76% on the general domain and 5.58% on the target domain with only 15% of parameters tuned and less than 50% training time. Ablation studies, theoretical analysis, and extended investigations further demonstrate the necessity of targeted expansion and decoupled optimization, providing new principles for efficient and robust domain-adaptive CPT. Our code is open-sourced at https://github.com/PuppyKnightUniversity/ADEPT
Problem

Research questions and friction points this paper is trying to address.

Addresses catastrophic forgetting in continual pretraining of large language models
Solves limited domain capacity through adaptive layer expansion strategies
Separates general and domain learning via decoupled parameter optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective layer expansion for domain capacity
Unit-wise decoupled tuning for knowledge balance
Asymmetric learning rates for retention and injection
🔎 Similar Papers
No similar papers found.
J
Jinyang Zhang
Key Laboratory of High Confidence Software Technologies, Ministry of Education; School of Computer Science, Peking University, Beijing, China
Y
Yue Fang
Key Laboratory of High Confidence Software Technologies, Ministry of Education; School of Computer Science, Peking University, Beijing, China
H
Hongxin Ding
Key Laboratory of High Confidence Software Technologies, Ministry of Education; School of Computer Science, Peking University, Beijing, China
Weibin Liao
Weibin Liao
Peking University
Large Language ModelReinforcement LearningMedical Image Analysis
M
Muyang Ye
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
X
Xu Chu
Key Laboratory of High Confidence Software Technologies, Ministry of Education; School of Computer Science, Peking University, Beijing, China
Junfeng Zhao
Junfeng Zhao
Assistant Professor at Arizona State University, Director of BELIV Lab
Connected & Automated VehicleMotion Planning & ControlsElectric VehiclesAI/ML
Y
Yasha Wang
Key Laboratory of High Confidence Software Technologies, Ministry of Education; School of Computer Science, Peking University, Beijing, China