Trace Anything: Representing Any Video in 4D via Trajectory Fields

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

This work addresses the problem of continuous spatiotemporal dynamic modeling in videos, aiming to jointly represent and predict pixel-wise 3D motion trajectories. To this end, we introduce the **Trajectory Field**—a novel 4D spatiotemporal implicit representation that models each pixel’s motion as a continuous 3D trajectory. Our method parameterizes trajectories using B-splines and employs a neural network to predict per-pixel control points in a single forward pass, enabling efficient full-video trajectory synthesis. The framework inherently supports emergent capabilities including target-guided tracking, long-horizon motion prediction, and spatiotemporal information fusion. We establish a new Trajectory Field benchmark and evaluate on standard point-tracking tasks, achieving state-of-the-art performance without iterative optimization. Our approach significantly improves both inference efficiency and accuracy, demonstrating superior generalization and scalability across diverse motion patterns.

Technology Category

Application Category

📝 Abstract

Effective spatio-temporal representation is fundamental to modeling, understanding, and predicting dynamics in videos. The atomic unit of a video, the pixel, traces a continuous 3D trajectory over time, serving as the primitive element of dynamics. Based on this principle, we propose representing any video as a Trajectory Field: a dense mapping that assigns a continuous 3D trajectory function of time to each pixel in every frame. With this representation, we introduce Trace Anything, a neural network that predicts the entire trajectory field in a single feed-forward pass. Specifically, for each pixel in each frame, our model predicts a set of control points that parameterizes a trajectory (i.e., a B-spline), yielding its 3D position at arbitrary query time instants. We trained the Trace Anything model on large-scale 4D data, including data from our new platform, and our experiments demonstrate that: (i) Trace Anything achieves state-of-the-art performance on our new benchmark for trajectory field estimation and performs competitively on established point-tracking benchmarks; (ii) it offers significant efficiency gains thanks to its one-pass paradigm, without requiring iterative optimization or auxiliary estimators; and (iii) it exhibits emergent abilities, including goal-conditioned manipulation, motion forecasting, and spatio-temporal fusion. Project page: https://trace-anything.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Representing videos as dense 4D trajectory fields for dynamic modeling

Predicting continuous pixel trajectories across frames efficiently

Enabling video manipulation and motion forecasting through trajectory estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Representing video as dense 4D trajectory fields

Predicting entire trajectory field in single forward pass

Using control points to parameterize 3D B-spline trajectories

🔎 Similar Papers

No similar papers found.

Toyota Research Institute

Los Altos, CA

AI Research Scientist, Computer Vision - Facebook Video Intelligence