TrajLoom: Dense Future Trajectory Generation from Video

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work proposes TrajLoom, a method for predicting dense, long-horizon future point trajectories and their visibility from observed videos to support video understanding and controllable generation. To mitigate positional bias, the approach introduces Grid-Anchor Offset Encoding and designs TrajLoom-VAE, which learns a structured latent trajectory space through masked reconstruction and spatiotemporal consistency regularization. Long-range generation stability is achieved via TrajLoom-Flow, leveraging flow matching, boundary-aware prompts, and on-policy K-step fine-tuning. The study also establishes TrajLoomBench, a unified benchmark for evaluation. Experiments demonstrate that TrajLoom extends the prediction horizon from 24 to 81 frames, significantly improving trajectory realism and temporal stability across multiple datasets, with generated trajectories directly applicable to downstream video synthesis and editing tasks.

Technology Category

Application Category

📝 Abstract

Predicting future motion is crucial in video understanding and controllable video generation. Dense point trajectories are a compact, expressive motion representation, but modeling their future evolution from observed video remains challenging. We propose a framework that predicts future trajectories and visibility from past trajectories and video context. Our method has three components: (1) Grid-Anchor Offset Encoding, which reduces location-dependent bias by representing each point as an offset from its pixel-center anchor; (2) TrajLoom-VAE, which learns a compact spatiotemporal latent space for dense trajectories with masked reconstruction and a spatiotemporal consistency regularizer; and (3) TrajLoom-Flow, which generates future trajectories in latent space via flow matching, with boundary cues and on-policy K-step fine-tuning for stable sampling. We also introduce TrajLoomBench, a unified benchmark spanning real and synthetic videos with a standardized setup aligned with video-generation benchmarks. Compared with state-of-the-art methods, our approach extends the prediction horizon from 24 to 81 frames while improving motion realism and stability across datasets. The predicted trajectories directly support downstream video generation and editing. Code, model checkpoints, and datasets are available at https://trajloom.github.io/.

Problem

Research questions and friction points this paper is trying to address.

dense trajectory prediction

future motion modeling

video understanding

controllable video generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

dense trajectory prediction

flow matching

spatiotemporal latent space

trajectory generation