AutoQ-VIS: Improving Unsupervised Video Instance Segmentation via Automatic Quality Assessment

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Video instance segmentation (VIS) faces significant challenges in unsupervised settings, including large domain gaps between synthetic and real-world data, and reliance on optical flow or manual annotations. To address these issues, we propose a quality-guided closed-loop self-training framework that enables unsupervised domain adaptation from synthetic to real-world videos. Our method introduces automated pseudo-label quality assessment and a progressive filtering mechanism, establishing a self-iterative “evaluate–filter–train” loop. Crucially, it eliminates the need for optical flow estimation and human annotations entirely. Evaluated on the YouTube-VIS 2019 validation set, our approach achieves 52.6 AP₅₀—surpassing the prior state-of-the-art unsupervised method VideoCutLER by 4.4 points. To the best of our knowledge, this is the first work to significantly narrow the synthetic-to-real domain gap in VIS under a zero-shot, annotation-free setting.

Technology Category

Application Category

📝 Abstract
Video Instance Segmentation (VIS) faces significant annotation challenges due to its dual requirements of pixel-level masks and temporal consistency labels. While recent unsupervised methods like VideoCutLER eliminate optical flow dependencies through synthetic data, they remain constrained by the synthetic-to-real domain gap. We present AutoQ-VIS, a novel unsupervised framework that bridges this gap through quality-guided self-training. Our approach establishes a closed-loop system between pseudo-label generation and automatic quality assessment, enabling progressive adaptation from synthetic to real videos. Experiments demonstrate state-of-the-art performance with 52.6 $ ext{AP}_{50}$ on YouTubeVIS-2019 val set, surpassing the previous state-of-the-art VideoCutLER by 4.4$%$, while requiring no human annotations. This demonstrates the viability of quality-aware self-training for unsupervised VIS. The source code of our method is available at https://github.com/wcbup/AutoQ-VIS.
Problem

Research questions and friction points this paper is trying to address.

Addresses unsupervised video instance segmentation challenges
Bridges synthetic-to-real domain gap in video data
Improves pseudo-label quality via automatic assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quality-guided self-training framework
Closed-loop pseudo-label quality assessment
Automatic synthetic-to-real video adaptation
🔎 Similar Papers
No similar papers found.