Context-Aware Video Instance Segmentation

📅 2024-07-03
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient cross-frame instance association robustness in Video Instance Segmentation (VIS) and Video Panoptic Segmentation (VPS), this paper proposes the Context-Aware Instance Tracker (CAIT), the first framework to jointly optimize explicit local spatial context modeling and object-level feature temporal consistency. Methodologically, CAIT employs a Transformer-based multi-frame feature fusion architecture, incorporates a Context-Gated Aggregation module to enhance utilization of neighboring region information, and introduces a Prototypical Cross-Frame Contrastive (PCC) loss to drive a prototype memory bank for fine-grained temporal matching. Evaluated on all major VIS/VPS benchmarks—including the highly challenging OVIS—CAIT achieves state-of-the-art performance, improving OVIS mAP by 3.2 points. It notably enhances instance consistency modeling under severe occlusion, small-object tracking, and complex motion scenarios.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. To efficiently extract and leverage this information, we propose the Context-Aware Instance Tracker (CAIT), which merges contextual data surrounding the instances with the core instance features to improve tracking accuracy. Additionally, we introduce the Prototypical Cross-frame Contrastive (PCC) loss, which ensures consistency in object-level features across frames, thereby significantly enhancing instance matching accuracy. CAVIS demonstrates superior performance over state-of-the-art methods on all benchmark datasets in video instance segmentation (VIS) and video panoptic segmentation (VPS). Notably, our method excels on the OVIS dataset, which is known for its particularly challenging videos.
Problem

Research questions and friction points this paper is trying to address.

Enhancing instance association using contextual information
Improving tracking accuracy with context-aware instance features
Ensuring cross-frame object feature consistency for better matching
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates contextual information for instance association
Uses Context-Aware Instance Tracker (CAIT) for accuracy
Applies Prototypical Cross-frame Contrastive (PCC) loss
🔎 Similar Papers
No similar papers found.
Seunghun Lee
Seunghun Lee
Korea University
J
Jiwan Seo
Department of Electrical Engineering & Computer Science, DGIST, Daegu, Korea
K
Kiljoon Han
Department of Electrical Engineering & Computer Science, DGIST, Daegu, Korea
M
Minwoo Choi
Department of Electrical Engineering & Computer Science, DGIST, Daegu, Korea
S
S. Im
Department of Electrical Engineering & Computer Science, DGIST, Daegu, Korea