ProgD: Progressive Multi-scale Decoding with Dynamic Graphs for Joint Multi-agent Motion Forecasting

📅 2025-09-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods struggle to model the dynamically evolving social interactions among multiple agents, limiting prediction consistency and accuracy. This paper addresses autonomous driving scenarios by proposing ProgD, a progressive multi-scale decoding framework. First, it constructs a dynamic heterogeneous graph to explicitly capture the time-varying nature of future inter-agent interactions. Second, it introduces a factorized multi-scale decoder that progressively refines motion predictions across multiple temporal granularities, thereby reducing motion uncertainty stepwise. Finally, it integrates spatiotemporal dependency modeling to enhance the robustness of joint trajectory prediction. ProgD is the first approach to synergistically combine dynamic heterogeneous graph representation with progressive decoding, significantly improving the modeling capacity for interactive evolution. It achieves state-of-the-art performance on both the INTERACTION and Argoverse 2 benchmarks, ranking first on the INTERACTION leaderboard.

Technology Category

Application Category

📝 Abstract
Accurate motion prediction of surrounding agents is crucial for the safe planning of autonomous vehicles. Recent advancements have extended prediction techniques from individual agents to joint predictions of multiple interacting agents, with various strategies to address complex interactions within future motions of agents. However, these methods overlook the evolving nature of these interactions. To address this limitation, we propose a novel progressive multi-scale decoding strategy, termed ProgD, with the help of dynamic heterogeneous graph-based scenario modeling. In particular, to explicitly and comprehensively capture the evolving social interactions in future scenarios, given their inherent uncertainty, we design a progressive modeling of scenarios with dynamic heterogeneous graphs. With the unfolding of such dynamic heterogeneous graphs, a factorized architecture is designed to process the spatio-temporal dependencies within future scenarios and progressively eliminate uncertainty in future motions of multiple agents. Furthermore, a multi-scale decoding procedure is incorporated to improve on the future scenario modeling and consistent prediction of agents' future motion. The proposed ProgD achieves state-of-the-art performance on the INTERACTION multi-agent prediction benchmark, ranking $1^{st}$, and the Argoverse 2 multi-world forecasting benchmark.
Problem

Research questions and friction points this paper is trying to address.

Joint multi-agent motion forecasting with dynamic interaction modeling
Progressive decoding to reduce future motion uncertainty
Multi-scale approach for consistent future scenario prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive multi-scale decoding strategy
Dynamic heterogeneous graph-based scenario modeling
Factorized architecture processing spatio-temporal dependencies
🔎 Similar Papers
No similar papers found.
X
Xing Gao
Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
Zherui Huang
Zherui Huang
Shanghai Jiao Tong University
Deep Learning
Weiyao Lin
Weiyao Lin
Professor, Shanghai Jiao Tong University
Multimedia processingComputer visionMachine learningVideo coding
X
Xiao Sun
Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China