Beyond In-Distribution Performance: A Cross-Dataset Study of Trajectory Prediction Robustness

📅 2025-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates the out-of-distribution generalization capability of three mainstream trajectory prediction architectures—graph neural networks (GNNs), Transformers, and CNN-based models—under cross-dataset transfer between Argoverse 2 and Waymo Open Motion. It investigates how architectural inductive bias, training data scale, and data augmentation jointly affect robustness. Method: We propose a multi-model comparative experimental framework integrating cross-domain transfer evaluation and fine-grained error attribution analysis. Contribution/Results: Counterintuitively, compact models with strong inductive bias achieve superior cross-domain generalization under limited training data; scaling up training data degrades transfer performance to small target domains, challenging prevailing benchmarking practices. In the A2→WO setting, the smallest model reduces mean ADE by 12.3%; in WO→A2, all models suffer significant degradation, yet high-bias models retain an 8.7% ADE advantage over the second-best performer. These findings underscore the critical role of inductive bias—and caution against data-scale-centric optimization—in safety-critical motion forecasting.

Technology Category

Application Category

📝 Abstract
We study the Out-of-Distribution (OoD) generalization ability of three SotA trajectory prediction models with comparable In-Distribution (ID) performance but different model designs. We investigate the influence of inductive bias, size of training data and data augmentation strategy by training the models on Argoverse 2 (A2) and testing on Waymo Open Motion (WO) and vice versa. We find that the smallest model with highest inductive bias exhibits the best OoD generalization across different augmentation strategies when trained on the smaller A2 dataset and tested on the large WO dataset. In the converse setting, training all models on the larger WO dataset and testing on the smaller A2 dataset, we find that all models generalize poorly, even though the model with the highest inductive bias still exhibits the best generalization ability. We discuss possible reasons for this surprising finding and draw conclusions about the design and test of trajectory prediction models and benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Trajectory Prediction
Model Generalization
Data Augmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalization Ability
Inductive Bias
Cross-dataset Testing
🔎 Similar Papers
No similar papers found.
Y
Yue Yao
Continental Automotive GmbH, Dahlem Center for Machine Learning and Robotics, Freie Universitaet Berlin
Daniel Goehring
Daniel Goehring
Assistant Professor (Juniorprofessor), Freie Universität Berlin, Germany
RoboticsAutonomous VehiclesMachine LearningArtificial Intelligence
J
Joerg Reichardt
Continental Automotive GmbH