Robust Promptable Video Object Segmentation

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the significant performance degradation of promptable video object segmentation (PVOS) models under input corruption, which hinders their deployment in safety-critical scenarios. To tackle this challenge, we present the first systematic study on robust PVOS and propose Memory-object-conditioned Gated-rank Adaptation (MoGA), a novel approach that leverages a memory mechanism to preserve object-specific representations and dynamically modulates model responses to diverse degradation types, thereby achieving temporally consistent and robust segmentation. We also introduce RobustPVOS, the first hybrid benchmark combining real-world and synthetic data, comprising 351 challenging videos under adverse conditions and over 2,500 annotated object masks. Extensive experiments demonstrate that MoGA consistently outperforms existing methods across various corruption types, establishing a strong baseline for robust PVOS.

📝 Abstract

The performance of promptable video object segmentation (PVOS) models substantially degrades under input corruptions, which prevents PVOS deployment in safety-critical domains. This paper offers the first comprehensive study on robust PVOS (RobustPVOS). We first construct a new, comprehensive benchmark with two real-world evaluation datasets of 351 video clips and more than 2,500 object masks under real-world adverse conditions. At the same time, we generate synthetic training data by applying diverse and temporally varying corruptions to existing VOS datasets. Moreover, we present a new RobustPVOS method, dubbed Memory-object-conditioned Gated-rank Adaptation (MoGA). The key to successfully performing RobustPVOS is two-fold: effectively handling object-specific degradation and ensuring temporal consistency in predictions. MoGA leverages object-specific representations maintained in memory across frames to condition the robustification process, which allows the model to handle each tracked object differently in a temporally consistent way. Extensive experiments on our benchmark validate MoGA's efficacy, showing consistent and significant improvements across diverse corruption types on both synthetic and real-world datasets, establishing a strong baseline for future RobustPVOS research. Our benchmark is publicly available at https://sohyun-l.github.io/RobustPVOS_project_page/.

Problem

Research questions and friction points this paper is trying to address.

promptable video object segmentation

robustness

input corruptions

temporal consistency

video object segmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Robust Promptable Video Object Segmentation

Temporal Consistency

Object-specific Representation