๐ค AI Summary
Unsupervised video object segmentation (VOS) suffers from the scarcity of ground-truth optical flow annotations and the limited performance of conventional two-stream methods. Method: This paper proposes DepthFlowโthe first approach to uncover the strong structural correlation between depth and optical flow for salient objects. DepthFlow estimates per-frame depth from RGB input and synthesizes high-fidelity, structure-preserving optical flow via a geometrically informed flow-field transformation, thereby extending image-mask pairs into image-flow-mask triplets. The method employs an end-to-end trainable encoder-decoder architecture without requiring real optical flow supervision. Contribution/Results: DepthFlow achieves state-of-the-art performance across all major unsupervised VOS benchmarks, significantly outperforming existing two-stream approaches. Extensive experiments validate the effectiveness, generalizability, and practicality of depth-guided optical flow synthesis for unsupervised VOS.
๐ Abstract
Unsupervised video object segmentation (VOS) aims to detect the most prominent object in a video. Recently, two-stream approaches that leverage both RGB images and optical flow have gained significant attention, but their performance is fundamentally constrained by the scarcity of training data. To address this, we propose DepthFlow, a novel data generation method that synthesizes optical flow from single images. Our approach is driven by the key insight that VOS models depend more on structural information embedded in flow maps than on their geometric accuracy, and that this structure is highly correlated with depth. We first estimate a depth map from a source image and then convert it into a synthetic flow field that preserves essential structural cues. This process enables the transformation of large-scale image-mask pairs into image-flow-mask training pairs, dramatically expanding the data available for network training. By training a simple encoder-decoder architecture with our synthesized data, we achieve new state-of-the-art performance on all public VOS benchmarks, demonstrating a scalable and effective solution to the data scarcity problem.