iGSP:Implicit Gradient Subspace Projection for Efficient Continual Learning of Vision-Language Models

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

189K/year
🤖 AI Summary
This work addresses the challenges of parameter explosion and negative transfer in vision-language continual learning, which arise from task isolation and alignment based solely on superficial similarity. The authors reformulate shared alignment as a geometric problem of optimizing trajectory overlap within a low-rank subspace and propose an implicit gradient subspace projection mechanism. Their two-stage strategy first identifies a shared subspace to maximize knowledge reuse and then fine-tunes task-specific residuals in an orthogonal subspace, thereby circumventing conventional similarity assumptions. Leveraging the early-convergence property of MoE routers to construct subspace bases, the method integrates subspace-constrained regularization, basis pre-expansion, and routing-probability-driven dimension pruning. Evaluated on the MTIL benchmark, it achieves state-of-the-art performance while reducing trainable parameters by 42.7% and total model size by 86.9%.
📝 Abstract
Vision-Language Models require efficient adaptation to continually emerging downstream tasks. While Parameter-Efficient Fine-Tuning mitigates catastrophic forgetting, assigning isolated modules per task leads to parameter explosion. Conversely, recent similarity-driven sharing mechanisms falsely equate superficial visual similarity with underlying alignment consistency. This fundamental mismatch triggers severe negative transfer between visually similar but logically distinct tasks and fails to exploit alignment reuse across visually diverse ones. We argue thatalignment sharing is fundamentally a geometric problem of overlapping optimization trajectories within shared low-rank subspaces. Grounded in this insight, we propose iGSP, a novel framework that achieves efficient adaptation via implicit gradient subspace projection. Leveraging the early convergence of MoE routers to establish the subspace basis, iGSP bifurcates the adaptation process into two phases. First, the Subspace Identification phase introduces candidate experts via basis pre-expansion, applies a novel subspace-constrained regularization to implicitly project new task gradients onto the historical subspace, and precisely prunes redundant dimensions by treating routing probabilities as gradient flow indicators, ultimately to maximize knowledge reuse. Second, the Orthogonal Subspace Fine-Tuning phase fixes this structural basis and removes the regularization to rapidly fit the task-specific residual loss. Extensive experiments on the MTIL benchmark demonstrate that iGSP achieves state-of-the-art accuracy while significantly improving training efficiency, reducing the average trainable parameters by 42.7\% compared to current SOTA methods, and decreasing the final total parameters by 86.9\% relative to counterparts. The source code is available at https://github.com/GeoX-Lab/iGSP.
Problem

Research questions and friction points this paper is trying to address.

Continual Learning
Vision-Language Models
Parameter-Efficient Fine-Tuning
Negative Transfer
Alignment Reuse
Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit Gradient Subspace Projection
Continual Learning
Vision-Language Models
Parameter-Efficient Fine-Tuning
Subspace Sharing
🔎 Similar Papers
2024-03-04Computer Vision and Pattern RecognitionCitations: 3
X
Xuezhi Cui
School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
Dongbo Zhou
Dongbo Zhou
Central China Normal University
Visual AnalyticsBig Data
W
Wang Guo
School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
Zeyuan Wang
Zeyuan Wang
PhD, The University of Sydney
NLPMedical Informatics
Ziyu Li
Ziyu Li
Philips I&D Data & AI
Knowledge ExtractionQuery OptimizationMachine LearningGraph
G
Gaozhi Zhou
School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
X
Xian Li
School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
L
Ling Zhao
School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
W
Wentao Yang
School of Earth Sciences and Spatial Information Engineering, Hunan University of Science and Technology, Xiangtan 411201, China
C
Chao Tao
School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
Haifeng Li
Haifeng Li
Central South University
GISRemote sensingMachine learningSparse represetationBrain Theory