Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning

📅 2024-03-27

🏛️ arXiv.org

📈 Citations: 6

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Addressing the stability-plasticity dilemma, severe catastrophic forgetting, linear adapter expansion, and limited knowledge reuse in continual learning with pre-trained models (PTMs), this paper proposes the Self-Expanding Modular Adapter framework (SEMA). Methodologically, SEMA introduces: (1) a distribution-shift-driven self-expansion mechanism that dynamically detects task transitions via multi-level representation descriptors and incrementally adds or removes adapters on demand; (2) an expandable weighted routing scheme enabling adapter mixture outputs and sublinear parameter growth; and (3) a frozen-backbone, lightweight fine-tuning paradigm. Under the challenging replay-free setting, SEMA achieves state-of-the-art performance, significantly outperforming existing PTM-based continual learning approaches. Empirical results demonstrate that SEMA simultaneously attains sublinear adapter scaling and enhanced generalization—validating the synergistic benefits of controlled architectural expansion and adaptive knowledge integration.

Technology Category

Application Category

📝 Abstract

Continual learning (CL) aims to continually accumulate knowledge from a non-stationary data stream without catastrophic forgetting of learned knowledge, requiring a balance between stability and adaptability. Relying on the generalizable representation in pre-trained models (PTMs), PTM-based CL methods perform effective continual adaptation on downstream tasks by adding learnable adapters or prompts upon the frozen PTMs. However, many existing PTM-based CL methods use restricted adaptation on a fixed set of these modules to avoid forgetting, suffering from limited CL ability. Periodically adding task-specific modules results in linear model growth rate and impaired knowledge reuse. We propose Self-Expansion of pre-trained models with Modularized Adaptation (SEMA), a novel approach to enhance the control of stability-plasticity balance in PTM-based CL. SEMA automatically decides to reuse or add adapter modules on demand in CL, depending on whether significant distribution shift that cannot be handled is detected at different representation levels. We design modular adapter consisting of a functional adapter and a representation descriptor. The representation descriptors are trained as a distribution shift indicator and used to trigger self-expansion signals. For better composing the adapters, an expandable weighting router is learned jointly for mixture of adapter outputs. SEMA enables better knowledge reuse and sub-linear expansion rate. Extensive experiments demonstrate the effectiveness of the proposed self-expansion method, achieving state-of-the-art performance compared to PTM-based CL methods without memory rehearsal. Code is available at https://github.com/huiyiwang01/SEMA-CL.

Problem

Research questions and friction points this paper is trying to address.

Enhance stability-plasticity balance in continual learning

Automate adapter reuse or addition based on distribution shifts

Achieve sub-linear model growth with better knowledge reuse

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Expansion with Modularized Adaptation (SEMA)

Reuse or add adapter modules dynamically

Expandable weighting router for adapter mixture

🔎 Similar Papers

HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tuning