🤖 AI Summary
Current autonomous driving systems typically handle motion simulation, prediction, and planning as isolated tasks, lacking a unified modeling framework, which limits cross-task generalization and system scalability. This work proposes UniMotion, the first approach to integrate all three tasks within a single framework based on a decoder-only Transformer architecture. By incorporating multi-agent interaction modeling, spatiotemporal dynamic reasoning, and task-specific training strategies, UniMotion enables shared representation learning and joint optimization across tasks. Evaluated on the Waymo Open Motion Dataset, the method demonstrates strong generalization capabilities and achieves state-of-the-art performance across multiple tasks after fine-tuning, effectively revealing the underlying commonalities and complementarities among them.
📝 Abstract
Motion simulation, prediction and planning are foundational tasks in autonomous driving, each essential for modeling and reasoning about dynamic traffic scenarios. While often addressed in isolation due to their differing objectives, such as generating diverse motion states or estimating optimal trajectories, these tasks inherently depend on shared capabilities: understanding multi-agent interactions, modeling motion behaviors, and reasoning over temporal and spatial dynamics. Despite this underlying commonality, existing approaches typically adopt specialized model designs, which hinders cross-task generalization and system scalability. More critically, this separation overlooks the potential mutual benefits among tasks. Motivated by these observations, we propose UniMotion, a unified motion framework that captures shared structures across motion tasks while accommodating their individual requirements. Built on a decoder-only Transformer architecture, UniMotion employs dedicated interaction modes and tailored training strategies to simultaneously support these motion tasks. This unified design not only enables joint optimization and representation sharing but also allows for targeted fine-tuning to specialize in individual tasks when needed. Extensive experiments on the Waymo Open Motion Dataset demonstrate that joint training leads to robust generalization and effective task integration. With further fine-tuning, UniMotion achieves state-of-the-art performance across a range of motion tasks, establishing it as a versatile and scalable solution for autonomous driving.