MV-TAP: Tracking Any Point in Multi-View Videos

📅 2025-12-01

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the problem of robust trajectory tracking for arbitrary points in multi-view dynamic videos. We propose a novel method that jointly leverages camera geometric constraints and cross-view spatiotemporal attention: it explicitly models multi-camera projection geometry and introduces a learnable cross-view attention module to aggregate consistent, viewpoint-invariant features. To enable end-to-end training and rigorous generalization evaluation, we construct a large-scale synthetic training dataset and a real-world benchmark with ground-truth trajectories. Our approach achieves state-of-the-art accuracy and robustness on challenging benchmarks—including MVS1K and DynamicScene—outperforming prior methods by significant margins. Notably, it establishes the first strong baseline for multi-view point tracking, introducing a new paradigm for dynamic scene understanding and 3D motion analysis.

Technology Category

Application Category

📝 Abstract

Multi-view camera systems enable rich observations of complex real-world scenes, and understanding dynamic objects in multi-view settings has become central to various applications. In this work, we present MV-TAP, a novel point tracker that tracks points across multi-view videos of dynamic scenes by leveraging cross-view information. MV-TAP utilizes camera geometry and a cross-view attention mechanism to aggregate spatio-temporal information across views, enabling more complete and reliable trajectory estimation in multi-view videos. To support this task, we construct a large-scale synthetic training dataset and real-world evaluation sets tailored for multi-view tracking. Extensive experiments demonstrate that MV-TAP outperforms existing point-tracking methods on challenging benchmarks, establishing an effective baseline for advancing research in multi-view point tracking.

Problem

Research questions and friction points this paper is trying to address.

Tracking points across multi-view videos of dynamic scenes

Aggregating spatio-temporal information across views for reliable trajectories

Establishing a baseline for multi-view point tracking research

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages cross-view attention for multi-view point tracking

Utilizes camera geometry to aggregate spatio-temporal information

Constructs synthetic dataset for training and real-world evaluation

🔎 Similar Papers

No similar papers found.