CoopTrack: Exploring End-to-End Learning for Efficient Cooperative Sequential Perception

📅 2025-07-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient sequence-aware modeling and excessive communication overhead in multi-vehicle collaborative 3D multi-object tracking, this paper proposes CoopTrack, an end-to-end instance-level cooperative tracking framework. Methodologically, CoopTrack integrates a learnable instance association mechanism with sparse instance feature transmission, multi-dimensional semantic–motion feature extraction, and cross-agent feature map association aggregation—enabling adaptive collaborative perception under low-bandwidth constraints. It unifies instance detection, association, and state estimation via a hybrid architecture combining Transformers and graph neural networks. Evaluated on V2X-Seq and Griffin benchmarks, CoopTrack achieves 39.0% mAP and 32.8% AMOTA while significantly reducing communication cost, outperforming existing state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Cooperative perception aims to address the inherent limitations of single-vehicle autonomous driving systems through information exchange among multiple agents. Previous research has primarily focused on single-frame perception tasks. However, the more challenging cooperative sequential perception tasks, such as cooperative 3D multi-object tracking, have not been thoroughly investigated. Therefore, we propose CoopTrack, a fully instance-level end-to-end framework for cooperative tracking, featuring learnable instance association, which fundamentally differs from existing approaches. CoopTrack transmits sparse instance-level features that significantly enhance perception capabilities while maintaining low transmission costs. Furthermore, the framework comprises two key components: Multi-Dimensional Feature Extraction, and Cross-Agent Association and Aggregation, which collectively enable comprehensive instance representation with semantic and motion features, and adaptive cross-agent association and fusion based on a feature graph. Experiments on both the V2X-Seq and Griffin datasets demonstrate that CoopTrack achieves excellent performance. Specifically, it attains state-of-the-art results on V2X-Seq, with 39.0% mAP and 32.8% AMOTA. The project is available at https://github.com/zhongjiaru/CoopTrack.
Problem

Research questions and friction points this paper is trying to address.

Addresses limitations of single-vehicle perception via multi-agent cooperation
Focuses on cooperative sequential perception, not just single-frame tasks
Proposes efficient instance-level tracking with low transmission costs
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end learning for cooperative sequential perception
Learnable instance association framework
Sparse instance-level feature transmission
🔎 Similar Papers
No similar papers found.
J
Jiaru Zhong
Institute for AI Industry Research, Tsinghua University; The Hong Kong Polytechnic University
J
Jiahao Wang
School of Vehicle and Mobility, Tsinghua University
Jiahui Xu
Jiahui Xu
ETH Zurich
Electronic Design AutomationFormal Verification
Xiaofan Li
Xiaofan Li
East China Normal University
Computer Vision
Zaiqing Nie
Zaiqing Nie
Tsinghua University
NLPData MiningMachine Learning
H
Haibao Yu
The University of Hong Kong; Institute for AI Industry Research, Tsinghua University