🤖 AI Summary
This paper addresses the significant performance degradation of human trajectory prediction models under zero-shot cross-dataset transfer, primarily caused by temporal dynamic discrepancies—such as varying frame rates and observation durations—across datasets. To mitigate this, we propose a generalization framework based on explicit temporal metadata-conditioned modeling. Our core innovation lies in incorporating time-related attributes (e.g., frame rate) as learnable conditional inputs into a Transformer architecture, thereby decoupling temporal dynamics modeling from spatial behavioral modeling and alleviating distribution shift effects. The model is pre-trained on large-scale heterogeneous trajectory data and achieves strong zero-shot transfer without requiring target-domain annotations. Experiments across four major benchmarks—NBA, JTA, WorldPose, and ETH-UCY—demonstrate that our method reduces average displacement error by over 70% under zero-shot transfer. Moreover, with only lightweight fine-tuning, it attains state-of-the-art performance.
📝 Abstract
While large-scale pre-training has advanced human trajectory prediction, a critical challenge remains: zero-shot transfer to unseen dataset with varying temporal dynamics. State-of-the-art pre-trained models often require fine-tuning to adapt to new datasets with different frame rates or observation horizons, limiting their scalability and practical utility. In this work, we systematically investigate this limitation and propose a robust solution. We first demonstrate that existing data-aware discrete models struggle when transferred to new scenarios with shifted temporal setups. We then isolate the temporal generalization from dataset shift, revealing that a simple, explicit conditioning mechanism for temporal metadata is a highly effective solution. Based on this insight, we present OmniTraj, a Transformer-based model pre-trained on a large-scale, heterogeneous dataset. Our experiments show that explicitly conditioning on the frame rate enables OmniTraj to achieve state-of-the-art zero-shot transfer performance, reducing prediction error by over 70% in challenging cross-setup scenarios. After fine-tuning, OmniTraj achieves state-of-the-art results on four datasets, including NBA, JTA, WorldPose, and ETH-UCY. The code is publicly available: https://github.com/vita-epfl/omnitraj