🤖 AI Summary
Existing methods struggle to model the dynamically evolving social interactions among multiple agents, limiting prediction consistency and accuracy. This paper addresses autonomous driving scenarios by proposing ProgD, a progressive multi-scale decoding framework. First, it constructs a dynamic heterogeneous graph to explicitly capture the time-varying nature of future inter-agent interactions. Second, it introduces a factorized multi-scale decoder that progressively refines motion predictions across multiple temporal granularities, thereby reducing motion uncertainty stepwise. Finally, it integrates spatiotemporal dependency modeling to enhance the robustness of joint trajectory prediction. ProgD is the first approach to synergistically combine dynamic heterogeneous graph representation with progressive decoding, significantly improving the modeling capacity for interactive evolution. It achieves state-of-the-art performance on both the INTERACTION and Argoverse 2 benchmarks, ranking first on the INTERACTION leaderboard.
📝 Abstract
Accurate motion prediction of surrounding agents is crucial for the safe planning of autonomous vehicles. Recent advancements have extended prediction techniques from individual agents to joint predictions of multiple interacting agents, with various strategies to address complex interactions within future motions of agents. However, these methods overlook the evolving nature of these interactions. To address this limitation, we propose a novel progressive multi-scale decoding strategy, termed ProgD, with the help of dynamic heterogeneous graph-based scenario modeling. In particular, to explicitly and comprehensively capture the evolving social interactions in future scenarios, given their inherent uncertainty, we design a progressive modeling of scenarios with dynamic heterogeneous graphs. With the unfolding of such dynamic heterogeneous graphs, a factorized architecture is designed to process the spatio-temporal dependencies within future scenarios and progressively eliminate uncertainty in future motions of multiple agents. Furthermore, a multi-scale decoding procedure is incorporated to improve on the future scenario modeling and consistent prediction of agents' future motion. The proposed ProgD achieves state-of-the-art performance on the INTERACTION multi-agent prediction benchmark, ranking $1^{st}$, and the Argoverse 2 multi-world forecasting benchmark.