Continual Learning with Vision-Language Models via Semantic-Geometry Preservation

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This work addresses the challenge of catastrophic forgetting in pretrained vision-language models during continual learning, which often distorts the cross-modal semantic geometric structure—particularly at the boundaries between old and new tasks. To mitigate this, the authors propose SeGP-CL, a novel framework that explicitly models and preserves this geometric structure. SeGP-CL employs dual-objective projected gradient descent (DPGD) to generate adversarial anchors that identify vulnerable regions prone to semantic drift. These anchors guide cross-modal geometric distillation (ACGD) and are complemented by lightweight text semantic geometric regularization (TSGR), enabling stable learning under strict no-sample constraints. Additionally, the framework incorporates prototype transfer and a dual-path fusion inference strategy. Extensive experiments demonstrate that SeGP-CL achieves state-of-the-art performance across five continual learning benchmarks, significantly enhancing model stability, forward transfer capability, and semantic geometric consistency.

Technology Category

Application Category

📝 Abstract

Continual learning of pretrained vision-language models (VLMs) is prone to catastrophic forgetting, yet current approaches adapt to new tasks without explicitly preserving the cross-modal semantic geometry inherited from pretraining and previous stages, allowing new-task supervision to induce geometric distortion. We observe that the most pronounced drift tends to concentrate in vulnerable neighborhoods near the old-new semantic interface, where shared visual patterns are easily re-explained by new textual semantics. To address this under an exemplar-free constraint, we propose Semantic Geometry Preservation for Continual Learning (SeGP-CL). SeGP-CL first probes the drift-prone region by constructing a compact set of adversarial anchors with dual-targeted projected gradient descent (DPGD), which drives selected new-task seeds toward old-class semantics while remaining faithful in raw visual space. During training, we preserve cross-modal structure by anchor-guided cross-modal geometry distillation (ACGD), and stabilize the textual reference frame across tasks via a lightweight text semantic-geometry regularization (TSGR). After training, we estimate anchor-induced raw-space drift to transfer old visual prototypes and perform dual-path inference by fusing cross-modal and visual cues. Extensive experiments on five continual learning benchmarks demonstrate that SeGP-CL consistently improves stability and forward transfer, achieving state-of-the-art performance while better preserving semantic geometry of VLMs.

Problem

Research questions and friction points this paper is trying to address.

continual learning

vision-language models

catastrophic forgetting

semantic geometry

cross-modal alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Geometry Preservation

Vision-Language Models

Continual Learning