CRISP: Contrastive Residual Injection and Semantic Prompting for Continual Video Instance Segmentation

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Continual video instance segmentation faces three key challenges: catastrophic forgetting in class-incremental learning, instance confusion, and temporal inconsistency. To address these, we propose a unified framework integrating contrastive learning with residual semantic prompting. Specifically, we introduce an instance association loss to enforce inter-frame consistency; design an adaptive residual semantic prompt pool for class-aware, learnable feature enhancement; and incorporate cross-task prompt initialization with query-prompt matching. Notably, we are the first to embed contrastive learning into semantic consistency constraints to jointly balance plasticity and stability. Our method achieves state-of-the-art performance on YouTube-VIS 2019 and 2021, outperforming existing continual learning approaches by up to +4.2% mAP in long-term incremental settings. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
Continual video instance segmentation demands both the plasticity to absorb new object categories and the stability to retain previously learned ones, all while preserving temporal consistency across frames. In this work, we introduce Contrastive Residual Injection and Semantic Prompting (CRISP), an earlier attempt tailored to address the instance-wise, category-wise, and task-wise confusion in continual video instance segmentation. For instance-wise learning, we model instance tracking and construct instance correlation loss, which emphasizes the correlation with the prior query space while strengthening the specificity of the current task query. For category-wise learning, we build an adaptive residual semantic prompt (ARSP) learning framework, which constructs a learnable semantic residual prompt pool generated by category text and uses an adjustive query-prompt matching mechanism to build a mapping relationship between the query of the current task and the semantic residual prompt. Meanwhile, a semantic consistency loss based on the contrastive learning is introduced to maintain semantic coherence between object queries and residual prompts during incremental training. For task-wise learning, to ensure the correlation at the inter-task level within the query space, we introduce a concise yet powerful initialization strategy for incremental prompts. Extensive experiments on YouTube-VIS-2019 and YouTube-VIS-2021 datasets demonstrate that CRISP significantly outperforms existing continual segmentation methods in the long-term continual video instance segmentation task, avoiding catastrophic forgetting and effectively improving segmentation and classification performance. The code is available at https://github.com/01upup10/CRISP.
Problem

Research questions and friction points this paper is trying to address.

Address instance-wise confusion in continual video segmentation
Resolve category-wise confusion via adaptive semantic prompts
Mitigate task-wise confusion with incremental prompt initialization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive Residual Injection for instance-wise learning
Adaptive Residual Semantic Prompt for category-wise learning
Initialization strategy for inter-task correlation
🔎 Similar Papers
No similar papers found.
B
Baichen Liu
State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China
Qi Lyu
Qi Lyu
Master of Science, Michigan State University
Deep LearningNLP
X
Xudong Wang
State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China; University of Chinese Academy of Sciences, Beijing 100049, China
J
Jiahua Dong
Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
Lianqing Liu
Lianqing Liu
Professor, Shenyang Institute of Automation, Chinese Academy of Sciences
Biosyncretic RobotMicro/Nano RoboticsIntelligent Machine
Zhi Han
Zhi Han
SIA, CAS
Computer Vision