OmniTraj: Pre-Training on Heterogeneous Data for Adaptive and Zero-Shot Human Trajectory Prediction

📅 2025-07-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the significant performance degradation of human trajectory prediction models under zero-shot cross-dataset transfer, primarily caused by temporal dynamic discrepancies—such as varying frame rates and observation durations—across datasets. To mitigate this, we propose a generalization framework based on explicit temporal metadata-conditioned modeling. Our core innovation lies in incorporating time-related attributes (e.g., frame rate) as learnable conditional inputs into a Transformer architecture, thereby decoupling temporal dynamics modeling from spatial behavioral modeling and alleviating distribution shift effects. The model is pre-trained on large-scale heterogeneous trajectory data and achieves strong zero-shot transfer without requiring target-domain annotations. Experiments across four major benchmarks—NBA, JTA, WorldPose, and ETH-UCY—demonstrate that our method reduces average displacement error by over 70% under zero-shot transfer. Moreover, with only lightweight fine-tuning, it attains state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
While large-scale pre-training has advanced human trajectory prediction, a critical challenge remains: zero-shot transfer to unseen dataset with varying temporal dynamics. State-of-the-art pre-trained models often require fine-tuning to adapt to new datasets with different frame rates or observation horizons, limiting their scalability and practical utility. In this work, we systematically investigate this limitation and propose a robust solution. We first demonstrate that existing data-aware discrete models struggle when transferred to new scenarios with shifted temporal setups. We then isolate the temporal generalization from dataset shift, revealing that a simple, explicit conditioning mechanism for temporal metadata is a highly effective solution. Based on this insight, we present OmniTraj, a Transformer-based model pre-trained on a large-scale, heterogeneous dataset. Our experiments show that explicitly conditioning on the frame rate enables OmniTraj to achieve state-of-the-art zero-shot transfer performance, reducing prediction error by over 70% in challenging cross-setup scenarios. After fine-tuning, OmniTraj achieves state-of-the-art results on four datasets, including NBA, JTA, WorldPose, and ETH-UCY. The code is publicly available: https://github.com/vita-epfl/omnitraj
Problem

Research questions and friction points this paper is trying to address.

Zero-shot transfer to unseen datasets with varying temporal dynamics
Adapting pre-trained models to new frame rates without fine-tuning
Improving human trajectory prediction accuracy in cross-setup scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based model pre-trained on heterogeneous data
Explicit conditioning on temporal metadata for adaptation
Zero-shot transfer with reduced prediction error
🔎 Similar Papers
No similar papers found.
Y
Yang Gao
Visual Intelligence for Transportation (VITA) laboratory, EPFL, Switzerland
Po-Chien Luan
Po-Chien Luan
PhD Student, EPFL
Deep LearningRobotics
K
Kaouther Messaoud
Visual Intelligence for Transportation (VITA) laboratory, EPFL, Switzerland
Lan Feng
Lan Feng
Ph.D. Student, EPFL
AIAutonomous Driving
Alexandre Alahi
Alexandre Alahi
Professor, EPFL
Computer VisionTransportationAutonomous drivingIntelligent Transportation SystemsAI