Dual Low-Rank Adaptation for Continual Learning with Pre-Trained Models

📅 2024-11-01
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Catastrophic forgetting in vision transformers (ViTs) during sequential multi-task learning, exacerbated by domain shift in large-scale models. Method: We propose a dual low-rank adaptation framework: (i) orthogonal LoRA constrains parameter updates to the historical task subspace to preserve prior knowledge, while (ii) residual LoRA models new tasks exclusively in the orthogonal complement subspace; further enhanced by dynamic memory scheduling to balance stability and plasticity. The approach is built upon parameter-efficient fine-tuning (PEFT), integrating orthogonal subspace optimization with residual subspace decomposition. Results: Our method achieves state-of-the-art accuracy across multiple continual learning benchmarks—including Split-CIFAR100, Domain-Net, and CORe50—while reducing GPU memory consumption by up to 42% and accelerating inference by 1.8× compared to existing approaches. It consistently outperforms prior PEFT-based and replay-free continual learning methods in both classification accuracy and efficiency.

Technology Category

Application Category

📝 Abstract
In the era of foundation models, we revisit continual learning~(CL), which aims to enable vision transformers (ViTs) to learn new tasks over time. However, as the scale of these models increases, catastrophic forgetting remains a persistent challenge, particularly in the presence of significant domain shifts across tasks. Recent studies highlight a crossover between CL techniques and parameter-efficient fine-tuning (PEFT), which focuses on fine-tuning only a small set of trainable parameters to adapt to downstream tasks, such as low-rank adaptation (LoRA). While LoRA achieves faster convergence and requires fewer trainable parameters, it has seldom been explored in the context of continual learning. To address this gap, we propose a novel PEFT-CL method called Dual Low-Rank Adaptation (DualLoRA), which introduces both an orthogonal LoRA adapter and a residual LoRA adapter parallel to pre-trained weights in each layer. These components are orchestrated by a dynamic memory mechanism to strike a balance between stability and plasticity. The orthogonal LoRA adapter's parameters are updated in an orthogonal subspace of previous tasks to mitigate catastrophic forgetting, while the residual LoRA adapter's parameters are updated in the residual subspace spanned by task-specific bases without interaction across tasks, offering complementary capabilities for fine-tuning new tasks. On ViT-based models, we demonstrate that DualLoRA offers significant advantages in accuracy, inference speed, and memory efficiency over existing CL methods across multiple benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Address catastrophic forgetting in continual learning with ViTs
Combine CL and PEFT via DualLoRA for efficient adaptation
Balance stability and plasticity using orthogonal and residual adapters
Innovation

Methods, ideas, or system contributions that make the work stand out.

DualLoRA combines orthogonal and residual LoRA adapters
Dynamic memory balances stability and plasticity
Orthogonal updates mitigate catastrophic forgetting effectively
🔎 Similar Papers
No similar papers found.
H
Huancheng Chen
University of Texas at Austin
J
Jingtao Li
SonyAI
N
Nidham Gazagnadou
SonyAI
Weiming Zhuang
Weiming Zhuang
Sony AI
Foundation ModelFederated Learning
C
Chen Chen
SonyAI
Lingjuan Lyu
Lingjuan Lyu
Sony
Foundation ModelsFederated LearningResponsible AI