TransFlow: Motion Knowledge Transfer from Video Diffusion Models to Video Salient Object Detection

📅 2025-07-26

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Video salient object detection (VSOD) heavily relies on authentic motion cues, yet suffers from severe scarcity of annotated video data. Existing methods that synthesize pseudo-video sequences from static images fail to produce semantically coherent and temporally consistent optical flow, resulting in poor motion-guided detection performance. To address this, we propose the first framework that transfers semantic motion priors from a pre-trained video diffusion model to VSOD. Our method conditions optical flow generation on a single input image, explicitly decoupling content and motion representations to synthesize physically plausible, scene-aware flow fields and corresponding training video sequences. Unlike conventional spatial transformation-based approaches, our method overcomes inherent limitations in motion realism. Extensive experiments demonstrate significant improvements across multiple VSOD benchmarks, validating both the effectiveness and generalizability of motion knowledge transfer.

Technology Category

Application Category

📝 Abstract

Video salient object detection (SOD) relies on motion cues to distinguish salient objects from backgrounds, but training such models is limited by scarce video datasets compared to abundant image datasets. Existing approaches that use spatial transformations to create video sequences from static images fail for motion-guided tasks, as these transformations produce unrealistic optical flows that lack semantic understanding of motion. We present TransFlow, which transfers motion knowledge from pre-trained video diffusion models to generate realistic training data for video SOD. Video diffusion models have learned rich semantic motion priors from large-scale video data, understanding how different objects naturally move in real scenes. TransFlow leverages this knowledge to generate semantically-aware optical flows from static images, where objects exhibit natural motion patterns while preserving spatial boundaries and temporal coherence. Our method achieves improved performance across multiple benchmarks, demonstrating effective motion knowledge transfer.

Problem

Research questions and friction points this paper is trying to address.

Transfer motion knowledge from video diffusion models

Generate realistic training data for video SOD

Improve video salient object detection performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transfer motion knowledge from video diffusion models

Generate realistic optical flows from static images

Enhance video salient object detection performance

🔎 Similar Papers

No similar papers found.