OmniTraj: Pre-Training on Heterogeneous Data for Adaptive and Zero-Shot Human Trajectory Prediction

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This paper addresses the significant performance degradation of human trajectory prediction models under zero-shot cross-dataset transfer, primarily caused by temporal dynamic discrepancies—such as varying frame rates and observation durations—across datasets. To mitigate this, we propose a generalization framework based on explicit temporal metadata-conditioned modeling. Our core innovation lies in incorporating time-related attributes (e.g., frame rate) as learnable conditional inputs into a Transformer architecture, thereby decoupling temporal dynamics modeling from spatial behavioral modeling and alleviating distribution shift effects. The model is pre-trained on large-scale heterogeneous trajectory data and achieves strong zero-shot transfer without requiring target-domain annotations. Experiments across four major benchmarks—NBA, JTA, WorldPose, and ETH-UCY—demonstrate that our method reduces average displacement error by over 70% under zero-shot transfer. Moreover, with only lightweight fine-tuning, it attains state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

While large-scale pre-training has advanced human trajectory prediction, a critical challenge remains: zero-shot transfer to unseen dataset with varying temporal dynamics. State-of-the-art pre-trained models often require fine-tuning to adapt to new datasets with different frame rates or observation horizons, limiting their scalability and practical utility. In this work, we systematically investigate this limitation and propose a robust solution. We first demonstrate that existing data-aware discrete models struggle when transferred to new scenarios with shifted temporal setups. We then isolate the temporal generalization from dataset shift, revealing that a simple, explicit conditioning mechanism for temporal metadata is a highly effective solution. Based on this insight, we present OmniTraj, a Transformer-based model pre-trained on a large-scale, heterogeneous dataset. Our experiments show that explicitly conditioning on the frame rate enables OmniTraj to achieve state-of-the-art zero-shot transfer performance, reducing prediction error by over 70% in challenging cross-setup scenarios. After fine-tuning, OmniTraj achieves state-of-the-art results on four datasets, including NBA, JTA, WorldPose, and ETH-UCY. The code is publicly available: https://github.com/vita-epfl/omnitraj

Problem

Research questions and friction points this paper is trying to address.

Zero-shot transfer to unseen datasets with varying temporal dynamics

Adapting pre-trained models to new frame rates without fine-tuning

Improving human trajectory prediction accuracy in cross-setup scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based model pre-trained on heterogeneous data

Explicit conditioning on temporal metadata for adaptation

Zero-shot transfer with reduced prediction error

🔎 Similar Papers

No similar papers found.