DECAMP: Towards Scene-Consistent Multi-Agent Motion Prediction with Disentangled Context-Aware Pre-Training

📅 2025-09-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations in multi-agent trajectory prediction—namely, scarce annotated data and insufficient scene modeling—this paper proposes a decoupled context-aware pretraining framework. Methodologically, it introduces a novel context self-encoder that explicitly decouples behavioral pattern learning from latent feature reconstruction; further, it designs a collaborative spatial-motion joint pretraining task to jointly optimize structural representation and dynamic intent modeling. The framework integrates decoupled representation learning, context-aware encoding, and multi-agent dynamic modeling to significantly enhance scene consistency and interpretability. Evaluated on the Argoverse 2 benchmark, the method achieves substantial improvements in both multi-agent motion prediction accuracy and scene plausibility, demonstrating strong effectiveness and generalization capability.

Technology Category

Application Category

📝 Abstract
Trajectory prediction is a critical component of autonomous driving, essential for ensuring both safety and efficiency on the road. However, traditional approaches often struggle with the scarcity of labeled data and exhibit suboptimal performance in multi-agent prediction scenarios. To address these challenges, we introduce a disentangled context-aware pre-training framework for multi-agent motion prediction, named DECAMP. Unlike existing methods that entangle representation learning with pretext tasks, our framework decouples behavior pattern learning from latent feature reconstruction, prioritizing interpretable dynamics and thereby enhancing scene representation for downstream prediction. Additionally, our framework incorporates context-aware representation learning alongside collaborative spatial-motion pretext tasks, which enables joint optimization of structural and intentional reasoning while capturing the underlying dynamic intentions. Our experiments on the Argoverse 2 benchmark showcase the superior performance of our method, and the results attained underscore its effectiveness in multi-agent motion forecasting. To the best of our knowledge, this is the first context autoencoder framework for multi-agent motion forecasting in autonomous driving. The code and models will be made publicly available.
Problem

Research questions and friction points this paper is trying to address.

Addresses multi-agent motion prediction challenges in autonomous driving
Overcomes scarcity of labeled data and suboptimal performance issues
Decouples behavior pattern learning from latent feature reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangled context-aware pre-training framework
Decouples behavior pattern from feature reconstruction
Joint optimization of structural-intentional reasoning
🔎 Similar Papers
No similar papers found.
Jianxin Shi
Jianxin Shi
Assistant Professor, Nankai Univeristy
Volumetric VideoMultimedia CommunicationsMobile edge computing
Z
Zengqi Peng
Robotics and Autonomous Systems Thrust, The Hong Kong University of Science and Technology (GZ), Guangzhou, China
X
Xiaolong Chen
Robotics and Autonomous Systems Thrust, The Hong Kong University of Science and Technology (GZ), Guangzhou, China
T
Tianyu Wo
School of Computer Science and Engineering, Beihang University, Beijing, China
J
Jun Ma
Robotics and Autonomous Systems Thrust, The Hong Kong University of Science and Technology (GZ), Guangzhou, China