DepthFlow: Exploiting Depth-Flow Structural Correlations for Unsupervised Video Object Segmentation

📅 2025-07-26

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Unsupervised video object segmentation (VOS) suffers from the scarcity of ground-truth optical flow annotations and the limited performance of conventional two-stream methods. Method: This paper proposes DepthFlow—the first approach to uncover the strong structural correlation between depth and optical flow for salient objects. DepthFlow estimates per-frame depth from RGB input and synthesizes high-fidelity, structure-preserving optical flow via a geometrically informed flow-field transformation, thereby extending image-mask pairs into image-flow-mask triplets. The method employs an end-to-end trainable encoder-decoder architecture without requiring real optical flow supervision. Contribution/Results: DepthFlow achieves state-of-the-art performance across all major unsupervised VOS benchmarks, significantly outperforming existing two-stream approaches. Extensive experiments validate the effectiveness, generalizability, and practicality of depth-guided optical flow synthesis for unsupervised VOS.

Technology Category

Application Category

📝 Abstract

Unsupervised video object segmentation (VOS) aims to detect the most prominent object in a video. Recently, two-stream approaches that leverage both RGB images and optical flow have gained significant attention, but their performance is fundamentally constrained by the scarcity of training data. To address this, we propose DepthFlow, a novel data generation method that synthesizes optical flow from single images. Our approach is driven by the key insight that VOS models depend more on structural information embedded in flow maps than on their geometric accuracy, and that this structure is highly correlated with depth. We first estimate a depth map from a source image and then convert it into a synthetic flow field that preserves essential structural cues. This process enables the transformation of large-scale image-mask pairs into image-flow-mask training pairs, dramatically expanding the data available for network training. By training a simple encoder-decoder architecture with our synthesized data, we achieve new state-of-the-art performance on all public VOS benchmarks, demonstrating a scalable and effective solution to the data scarcity problem.

Problem

Research questions and friction points this paper is trying to address.

Addressing data scarcity in unsupervised video object segmentation

Synthesizing optical flow from single images using depth

Improving VOS performance with structural depth-flow correlations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthesizes optical flow from single images

Converts depth maps into synthetic flow fields

Expands training data with image-flow-mask pairs

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Staff Software Engineer - Video Analytics

Qualcomm

$160,500.00 - $240,700.00

Santa Clara, California, United States of America / San Diego, CA, USA / Seattle, WA

AI Research Scientist, Computer Vision - Facebook Video Intelligence