🤖 AI Summary
This study systematically investigates how dataset design affects multi-agent trajectory prediction performance, focusing on feature efficacy, cross-dataset transferability, and geographic diversity. We introduce a custom L4-level motion prediction dataset featuring enhanced map representations and agent-centric attributes, and conduct cross-dataset experiments and country-wise evaluations on the Argoverse 2 benchmark. Our findings are threefold: (1) State-of-the-art models achieve high accuracy using only minimal base features; adding hand-crafted features yields no significant improvement—demonstrating that existing public datasets already provide sufficient representational capacity. (2) Models exhibit robust knowledge transfer across geographically distinct regions (e.g., the U.S., Germany, Japan), indicating that driving behavior shares more universal patterns than culture-specific ones. (3) We provide the first empirical evidence that geographic diversity in training data critically enhances model generalization. These results establish theoretical foundations and practical guidelines for principled trajectory prediction dataset construction.
📝 Abstract
Accurate trajectory prediction is critical for safe autonomous navigation, yet the impact of dataset design on model performance remains understudied. This work systematically examines how feature selection, cross-dataset transfer, and geographic diversity influence trajectory prediction accuracy in multi-agent settings. We evaluate a state-of-the-art model using our novel L4 Motion Forecasting dataset based on our own data recordings in Germany and the US. This includes enhanced map and agent features. We compare our dataset to the US-centric Argoverse 2 benchmark. First, we find that incorporating supplementary map and agent features unique to our dataset, yields no measurable improvement over baseline features, demonstrating that modern architectures do not need extensive feature sets for optimal performance. The limited features of public datasets are sufficient to capture convoluted interactions without added complexity. Second, we perform cross-dataset experiments to evaluate how effective domain knowledge can be transferred between datasets. Third, we group our dataset by country and check the knowledge transfer between different driving cultures.