๐ค AI Summary
To address the ambiguity in 3D object detection caused by missing depth information in vision-only collaborative perception, this paper proposes a ray-fusion-based collaborative visual perception method. The core innovation is the first introduction of ray-wise occupancy modeling, which leverages camera geometric priors to align and fuse ray-level occupancy predictions across multiple vehicle views, effectively suppressing redundant responses and false positives along the line of sight. By integrating differentiable ray sampling with occupancy modeling, the method enhances depth perception robustlyโwithout requiring depth supervision or auxiliary sensors. Extensive experiments demonstrate state-of-the-art performance on major collaborative perception benchmarks, including DAIR-V2X and V2XSet, achieving absolute gains of 8.2โ12.6% in 3D detection mAP. The source code is publicly available.
๐ Abstract
Collaborative visual perception methods have gained widespread attention in the autonomous driving community in recent years due to their ability to address sensor limitation problems. However, the absence of explicit depth information often makes it difficult for camera-based perception systems, e.g., 3D object detection, to generate accurate predictions. To alleviate the ambiguity in depth estimation, we propose RayFusion, a ray-based fusion method for collaborative visual perception. Using ray occupancy information from collaborators, RayFusion reduces redundancy and false positive predictions along camera rays, enhancing the detection performance of purely camera-based collaborative perception systems. Comprehensive experiments show that our method consistently outperforms existing state-of-the-art models, substantially advancing the performance of collaborative visual perception. The code is available at https://github.com/wangsh0111/RayFusion.