🤖 AI Summary
This work addresses the challenge that existing unsupervised methods struggle to simultaneously handle both multi-agent and single-agent 3D perception tasks. It proposes the first unified unsupervised framework that enhances point cloud density by sharing LiDAR data among agents and leverages cooperative viewpoints to generate high-quality pseudo-labels, enabling joint optimization of single-view and multi-view 3D object detection and classification. The method introduces a learned proposal refinement filter, a progressive proposal stabilization module, and a cross-view consistency learning mechanism. Evaluated on the V2V4Real and OPV2V datasets under fully unsupervised settings, the proposed approach significantly outperforms current state-of-the-art methods and achieves the best performance across both perception tasks.
📝 Abstract
The LiDAR-based multi-agent and single-agent perception has shown promising performance in environmental understanding for robots and automated vehicles. However, there is no existing method that simultaneously solves both multi-agent and single-agent perception in an unsupervised way. By sharing sensor data between multiple agents via communication, this paper discovers two key insights: 1) Improved point cloud density after the data sharing from cooperative views could benefit unsupervised object classification, 2) Cooperative view of multiple agents can be used as unsupervised guidance for the 3D object detection in the single view. Based on these two discovered insights, we propose an Unsupervised Multi-agent and Single-agent (UMS) perception framework that leverages multi-agent cooperation without human annotations to simultaneously solve multi-agent and single-agent perception. UMS combines a learning-based Proposal Purifying Filter to better classify the candidate proposals after multi-agent point cloud density cooperation, followed by a Progressive Proposal Stabilizing module to yield reliable pseudo labels by the easy-to-hard curriculum learning. Furthermore, we design a Cross-View Consensus Learning to use multi-agent cooperative view to guide detection in single-agent view. Experimental results on two public datasets V2V4Real and OPV2V show that our UMS method achieved significantly higher 3D detection performance than the state-of-the-art methods on both multi-agent and single-agent perception tasks in an unsupervised setting.