CooPre: Cooperative Pretraining for V2X Cooperative Perception

📅 2024-08-20
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
V2X cooperative perception critically relies on costly and scarce 3D annotations for multi-agent scenarios, hindering scalable deployment. Method: This paper proposes the first self-supervised point cloud pre-training framework tailored for vehicle-infrastructure cooperative perception. It introduces a novel cooperative point cloud reconstruction pretext task and a BEV-guided heterogeneous masking strategy—specifically designed to adapt deeply to the heterogeneous (vehicle-road), multi-modal, and multi-agent perception paradigm. Crucially, the framework requires no 3D annotations, leveraging only large-scale unlabeled multi-source LiDAR point clouds to pre-train LiDAR encoders compatible with mainstream cooperative perception backbones. Contribution/Results: Extensive experiments demonstrate significant detection accuracy improvements on three major benchmarks—V2X-Real, V2V4Real, and OPV2V—alongside markedly enhanced cross-domain transferability, few-shot generalization, and robustness under challenging conditions such as occlusion and communication constraints.

Technology Category

Application Category

📝 Abstract
Existing Vehicle-to-Everything (V2X) cooperative perception methods rely on accurate multi-agent 3D annotations. Nevertheless, it is time-consuming and expensive to collect and annotate real-world data, especially for V2X systems. In this paper, we present a self-supervised learning method for V2X cooperative perception, which utilizes the vast amount of unlabeled 3D V2X data to enhance the perception performance. Beyond simply extending the previous pre-training methods for point-cloud representation learning, we introduce a novel self-supervised Cooperative Pretraining framework (termed as CooPre) customized for a collaborative scenario. We point out that cooperative point-cloud sensing compensates for information loss among agents. This motivates us to design a novel proxy task for the 3D encoder to reconstruct LiDAR point clouds across different agents. Besides, we develop a V2X bird-eye-view (BEV) guided masking strategy which effectively allows the model to pay attention to 3D features across heterogeneous V2X agents (i.e., vehicles and infrastructure) in the BEV space. Noticeably, such a masking strategy effectively pretrains the 3D encoder and is compatible with mainstream cooperative perception backbones. Our approach, validated through extensive experiments on representative datasets (i.e., V2X-Real, V2V4Real, and OPV2V), leads to a performance boost across all V2X settings. Additionally, we demonstrate the framework's improvements in cross-domain transferability, data efficiency, and robustness under challenging scenarios. The code will be made publicly available.
Problem

Research questions and friction points this paper is trying to address.

Self-supervised learning for V2X cooperative perception
Reconstructing LiDAR point clouds across multiple agents
Enhancing 3D feature attention in BEV space
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised learning for V2X perception
Multi-agent LiDAR point cloud reconstruction
BEV-guided masking for 3D feature attention
🔎 Similar Papers
No similar papers found.