🤖 AI Summary
Existing V2X cooperative perception research predominantly relies on simulation or static datasets, lacking empirical validation of the real-time feasibility of intermediate fusion in dynamic, real-world scenarios.
Method: We propose the first online cooperative perception framework deployed on a real-world vehicle-infrastructure cooperative platform, supporting early-, intermediate-, and late-fusion paradigms. It achieves millisecond-level end-to-end latency for intermediate fusion in complex urban environments.
Contribution/Results: We introduce a novel synchronized ROS Bag-based dynamic V2X benchmark dataset with fine-grained annotations—comprising 25,028 test frames and 6,850 keyframe annotations. The system integrates multi-sensor spatiotemporal synchronization, low-latency V2X communication protocols, and a dynamic annotation toolchain. Experimental results demonstrate significant improvements in occluded-region perception accuracy. This work provides critical empirical validation and an open-source infrastructure for engineering deployment of V2X intermediate fusion.
📝 Abstract
Cooperative perception enabled by Vehicle-to-Everything (V2X) communication holds significant promise for enhancing the perception capabilities of autonomous vehicles, allowing them to overcome occlusions and extend their field of view. However, existing research predominantly relies on simulated environments or static datasets, leaving the feasibility and effectiveness of V2X cooperative perception especially for intermediate fusion in real-world scenarios largely unexplored. In this work, we introduce V2X-ReaLO, an open online cooperative perception framework deployed on real vehicles and smart infrastructure that integrates early, late, and intermediate fusion methods within a unified pipeline and provides the first practical demonstration of online intermediate fusion's feasibility and performance under genuine real-world conditions. Additionally, we present an open benchmark dataset specifically designed to assess the performance of online cooperative perception systems. This new dataset extends V2X-Real dataset to dynamic, synchronized ROS bags and provides 25,028 test frames with 6,850 annotated key frames in challenging urban scenarios. By enabling real-time assessments of perception accuracy and communication lantency under dynamic conditions, V2X-ReaLO sets a new benchmark for advancing and optimizing cooperative perception systems in real-world applications. The codes and datasets will be released to further advance the field.