BECAME: BayEsian Continual Learning with Adaptive Model MErging

📅 2025-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In continual learning, balancing stability (resistance to forgetting) and plasticity (capacity to acquire new knowledge) remains a fundamental challenge. Existing gradient projection methods preserve stability at the expense of plasticity, while model merging approaches—though promising—often rely on heuristic assumptions and manual hyperparameter tuning. This paper introduces the first Bayesian-driven adaptive model fusion framework: it integrates Bayesian continual learning principles into model merging and derives task-adaptive, closed-form optimal fusion coefficients via rigorous theoretical analysis, eliminating hyperparameter sensitivity and empirical design. Our method synergistically combines gradient projection with a two-stage collaborative optimization. Evaluated on multiple standard benchmarks, it consistently outperforms state-of-the-art methods and mainstream merging strategies, achieving superior stability and plasticity simultaneously—thereby effectively mitigating catastrophic forgetting and enhancing generalization on novel tasks.

Technology Category

Application Category

📝 Abstract
Continual Learning (CL) strives to learn incrementally across tasks while mitigating catastrophic forgetting. A key challenge in CL is balancing stability (retaining prior knowledge) and plasticity (learning new tasks). While representative gradient projection methods ensure stability, they often limit plasticity. Model merging techniques offer promising solutions, but prior methods typically rely on empirical assumptions and carefully selected hyperparameters. In this paper, we explore the potential of model merging to enhance the stability-plasticity trade-off, providing theoretical insights that underscore its benefits. Specifically, we reformulate the merging mechanism using Bayesian continual learning principles and derive a closed-form solution for the optimal merging coefficient that adapts to the diverse characteristics of tasks. To validate our approach, we introduce a two-stage framework named BECAME, which synergizes the expertise of gradient projection and adaptive merging. Extensive experiments show that our approach outperforms state-of-the-art CL methods and existing merging strategies.
Problem

Research questions and friction points this paper is trying to address.

Balancing stability and plasticity in continual learning
Improving model merging with Bayesian principles
Adapting merging coefficients to diverse task characteristics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian continual learning with adaptive merging
Closed-form solution for optimal merging coefficient
Two-stage framework combining gradient projection and merging
🔎 Similar Papers
No similar papers found.
M
Mei Li
Shanghai Jiao Tong University
Y
Yuxiang Lu
Shanghai Jiao Tong University
Q
Qinyan Dai
Shanghai Jiao Tong University
Suizhi Huang
Suizhi Huang
Nanyang Technological University
Computer VisionFederated LearningMulti-task Learning
Y
Yue Ding
Shanghai Jiao Tong University
Hongtao Lu
Hongtao Lu
Shanghai Jiao Tong university
Artificial intelligenceMachine LearningComputer Vision