🤖 AI Summary
Existing deep learning-based trajectory prediction models struggle to capture strong inter-agent dependencies in complex interactive scenarios, leading to inconsistent predictions that compromise autonomous driving safety. To address this, we propose the first preference optimization framework tailored for vehicle trajectory prediction: it incorporates human behavioral priors—automatically derived from relative preferences over future trajectory sequences (e.g., collision avoidance, social plausibility, and motion smoothness)—into multi-agent joint prediction, thereby enhancing consistency without additional inference overhead. Our method requires no architectural modifications; instead, it achieves improved cooperative rationality via preference-driven fine-tuning alone. Evaluated on three benchmark datasets—nuScenes, Argoverse 2, and INTERACTION—it significantly improves scene consistency metrics (average +12.7%) while maintaining state-of-the-art trajectory accuracy (ADE/FDE remain virtually unchanged), demonstrating both effectiveness and practicality.
📝 Abstract
Trajectory prediction is an essential step in the pipeline of an autonomous vehicle. Inaccurate or inconsistent predictions regarding the movement of agents in its surroundings lead to poorly planned maneuvers and potentially dangerous situations for the end-user. Current state-of-the-art deep-learning-based trajectory prediction models can achieve excellent accuracy on public datasets. However, when used in more complex, interactive scenarios, they often fail to capture important interdependencies between agents, leading to inconsistent predictions among agents in the traffic scene. Inspired by the efficacy of incorporating human preference into large language models, this work fine-tunes trajectory prediction models in multi-agent settings using preference optimization. By taking as input automatically calculated preference rankings among predicted futures in the fine-tuning process, our experiments--using state-of-the-art models on three separate datasets--show that we are able to significantly improve scene consistency while minimally sacrificing trajectory prediction accuracy and without adding any excess computational requirements at inference time.