MOVE: Motion-Guided Few-Shot Video Object Segmentation

📅 2025-07-29

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Existing few-shot video object segmentation (FSVOS) methods neglect motion dynamics and over-rely on static category priors. To address this, we propose a motion-guided FSVOS paradigm. First, we introduce MOVE—the first few-shot video segmentation benchmark explicitly designed for motion pattern modeling. Second, we design the Decoupled Motion-Appearance (DMA) network, which explicitly separates and models motion cues and appearance representations via optical flow alignment, temporal modeling, and multi-scale feature fusion. Third, we conduct systematic evaluations of six state-of-the-art methods under two standard FSVOS settings, demonstrating that DMA significantly outperforms existing approaches and achieves new state-of-the-art performance on MOVE. This work pioneers the integration of motion awareness into few-shot video object segmentation, establishing both a novel benchmark and a principled methodological framework for dynamic scene understanding.

Technology Category

Application Category

📝 Abstract

This work addresses motion-guided few-shot video object segmentation (FSVOS), which aims to segment dynamic objects in videos based on a few annotated examples with the same motion patterns. Existing FSVOS datasets and methods typically focus on object categories, which are static attributes that ignore the rich temporal dynamics in videos, limiting their application in scenarios requiring motion understanding. To fill this gap, we introduce MOVE, a large-scale dataset specifically designed for motion-guided FSVOS. Based on MOVE, we comprehensively evaluate 6 state-of-the-art methods from 3 different related tasks across 2 experimental settings. Our results reveal that current methods struggle to address motion-guided FSVOS, prompting us to analyze the associated challenges and propose a baseline method, Decoupled Motion Appearance Network (DMA). Experiments demonstrate that our approach achieves superior performance in few shot motion understanding, establishing a solid foundation for future research in this direction.

Problem

Research questions and friction points this paper is trying to address.

Segment dynamic objects using few motion-guided examples

Overcome limitations of static category-focused FSVOS methods

Address lack of datasets for motion-guided video segmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces MOVE dataset for motion-guided FSVOS

Proposes Decoupled Motion Appearance Network (DMA)

Evaluates 6 state-of-the-art methods comprehensively

🔎 Similar Papers

No similar papers found.

Toyota Research Institute

Los Altos, CA

Abschlussarbeit im Bereich Künstliche Intelligenz und Automatisierung

Bosch Group

Attraktive Vergütung

Horb am Neckar, BW, DE

AI Research Scientist, Computer Vision - Facebook Video Intelligence