Decomposing and Composing: Towards Efficient Vision-Language Continual Learning via Rank-1 Expert Pool in a Single LoRA

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of poor task adaptability and catastrophic forgetting in vision-language models under continual learning settings by proposing an efficient, task-ID-free approach based on LoRA. The method decouples LoRA into a shared pool of rank-1 experts and dynamically composes task-specific updates through sparse combinations guided by the semantic content of the [CLS] token, substantially reducing parameter overhead. Additionally, an Activation-Guided Orthogonality (AGO) loss is introduced to mitigate interference across tasks. Requiring no external knowledge, task identifiers, or additional inference latency, the approach achieves state-of-the-art performance across multiple benchmarks with only 3.3% trainable parameters, demonstrating generalization capabilities that even surpass the zero-shot upper bound.

Technology Category

Application Category

📝 Abstract
Continual learning (CL) in vision-language models (VLMs) faces significant challenges in improving task adaptation and avoiding catastrophic forgetting. Existing methods usually have heavy inference burden or rely on external knowledge, while Low-Rank Adaptation (LoRA) has shown potential in reducing these issues by enabling parameter-efficient tuning. However, considering directly using LoRA to alleviate the catastrophic forgetting problem is non-trivial, we introduce a novel framework that restructures a single LoRA module as a decomposable Rank-1 Expert Pool. Our method learns to dynamically compose a sparse, task-specific update by selecting from this expert pool, guided by the semantics of the [CLS] token. In addition, we propose an Activation-Guided Orthogonal (AGO) loss that orthogonalizes critical parts of LoRA weights across tasks. This sparse composition and orthogonalization enable fewer parameter updates, resulting in domain-aware learning while minimizing inter-task interference and maintaining downstream task performance. Extensive experiments across multiple settings demonstrate state-of-the-art results in all metrics, surpassing zero-shot upper bounds in generalization. Notably, it reduces trainable parameters by 96.7% compared to the baseline method, eliminating reliance on external datasets or task-ID discriminators. The merged LoRAs retain less weights and incur no inference latency, making our method computationally lightweight.
Problem

Research questions and friction points this paper is trying to address.

continual learning
vision-language models
catastrophic forgetting
task adaptation
parameter efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rank-1 Expert Pool
LoRA
Continual Learning
Activation-Guided Orthogonal Loss
Parameter-Efficient Tuning
🔎 Similar Papers
No similar papers found.
Z
Zhan Fa
National Key Laboratory for Novel Software Technology, Nanjing University, China
Yue Duan
Yue Duan
Nanjing University
Semi-supervised LearningMultimodal LearningLarge Multimodal Models
Jian Zhang
Jian Zhang
Institute of Software, Chinese Academy of Sciences
automated reasoningprogram analysissoftware testingconstraint solving
Lei Qi
Lei Qi
Southeast University
Computer VisionPattern Recognition
W
Wanqi Yang
School of Computer and Electronic Information, Nanjing Normal University, China
Y
Yinghuan Shi
National Key Laboratory for Novel Software Technology, Nanjing University, China