Forecasting Motion in the Wild

📅 2026-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current visual systems lack a general, structured motion representation for non-rigid agents—such as wild animals—making it difficult to accurately predict their complex behaviors. This work proposes dense point trajectories as category-agnostic visual motion tokens to construct a mid-level representation that disentangles appearance from motion. A novel diffusion Transformer model is introduced to explicitly model occlusions and handle unordered trajectory sets. By integrating shot boundary detection with camera motion compensation, the method significantly outperforms existing baselines on 300 hours of in-the-wild animal videos, demonstrating strong generalization, data efficiency, and applicability to rare species and diverse morphologies.
📝 Abstract
Visual intelligence requires anticipating the future behavior of agents, yet vision systems lack a general representation for motion and behavior. We propose dense point trajectories as visual tokens for behavior, a structured mid-level representation that disentangles motion from appearance and generalizes across diverse non-rigid agents, such as animals in-the-wild. Building on this abstraction, we design a diffusion transformer that models unordered sets of trajectories and explicitly reasons about occlusion, enabling coherent forecasts of complex motion patterns. To evaluate at scale, we curate 300 hours of unconstrained animal video with robust shot detection and camera-motion compensation. Experiments show that forecasting trajectory tokens achieves category-agnostic, data-efficient prediction, outperforms state-of-the-art baselines, and generalizes to rare species and morphologies, providing a foundation for predictive visual intelligence in the wild.
Problem

Research questions and friction points this paper is trying to address.

motion forecasting
visual representation
non-rigid agents
predictive visual intelligence
in-the-wild behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

dense point trajectories
diffusion transformer
motion forecasting
occlusion reasoning
category-agnostic prediction
🔎 Similar Papers
No similar papers found.
N
Neerja Thakkar
UC Berkeley
Shiry Ginosar
Shiry Ginosar
Assistant Professor, TTIC
Computer ScienceComputer Vision
J
Jacob Walker
Google DeepMind
J
Jitendra Malik
UC Berkeley
J
Joao Carreira
Google DeepMind
Carl Doersch
Carl Doersch
Google DeepMind
Computer VisionMachine Learning