Decomposing and Composing: Towards Efficient Vision-Language Continual Learning via Rank-1 Expert Pool in a Single LoRA

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work addresses the challenges of poor task adaptability and catastrophic forgetting in vision-language models under continual learning settings by proposing an efficient, task-ID-free approach based on LoRA. The method decouples LoRA into a shared pool of rank-1 experts and dynamically composes task-specific updates through sparse combinations guided by the semantic content of the [CLS] token, substantially reducing parameter overhead. Additionally, an Activation-Guided Orthogonality (AGO) loss is introduced to mitigate interference across tasks. Requiring no external knowledge, task identifiers, or additional inference latency, the approach achieves state-of-the-art performance across multiple benchmarks with only 3.3% trainable parameters, demonstrating generalization capabilities that even surpass the zero-shot upper bound.

Technology Category

Application Category

📝 Abstract

Continual learning (CL) in vision-language models (VLMs) faces significant challenges in improving task adaptation and avoiding catastrophic forgetting. Existing methods usually have heavy inference burden or rely on external knowledge, while Low-Rank Adaptation (LoRA) has shown potential in reducing these issues by enabling parameter-efficient tuning. However, considering directly using LoRA to alleviate the catastrophic forgetting problem is non-trivial, we introduce a novel framework that restructures a single LoRA module as a decomposable Rank-1 Expert Pool. Our method learns to dynamically compose a sparse, task-specific update by selecting from this expert pool, guided by the semantics of the [CLS] token. In addition, we propose an Activation-Guided Orthogonal (AGO) loss that orthogonalizes critical parts of LoRA weights across tasks. This sparse composition and orthogonalization enable fewer parameter updates, resulting in domain-aware learning while minimizing inter-task interference and maintaining downstream task performance. Extensive experiments across multiple settings demonstrate state-of-the-art results in all metrics, surpassing zero-shot upper bounds in generalization. Notably, it reduces trainable parameters by 96.7% compared to the baseline method, eliminating reliance on external datasets or task-ID discriminators. The merged LoRAs retain less weights and incur no inference latency, making our method computationally lightweight.

Problem

Research questions and friction points this paper is trying to address.

continual learning

vision-language models

catastrophic forgetting

task adaptation

parameter efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rank-1 Expert Pool

LoRA

Continual Learning