🤖 AI Summary
Existing V2V cooperative perception datasets are largely confined to routine traffic scenarios and lack high-quality multimodal data under complex adverse weather and illumination conditions, hindering the robustness of autonomous driving perception in challenging environments. To address this, we introduce the first real-world V2V cooperative perception dataset specifically designed for complex adverse traffic scenarios. It features hardware-level temporal synchronization across dual vehicles, capturing LiDAR, multi-view cameras, RTK-GNSS, and IMU data under ten distinct weather/illumination conditions. We propose a target-based temporal alignment method to achieve high-precision cross-modal spatiotemporal synchronization. Furthermore, we provide time-consistent 3D object annotations and static scene reconstruction, enabling 4D bird’s-eye-view (BEV) modeling. The dataset comprises 100 sequences, 60K LiDAR sweeps, 1.26M images, and 750K high-precision localization records—making it the largest and highest-quality V2V cooperative perception benchmark to date.
📝 Abstract
Vehicle-to-Vehicle (V2V) cooperative perception has great potential to enhance autonomous driving performance by overcoming perception limitations in complex adverse traffic scenarios (CATS). Meanwhile, data serves as the fundamental infrastructure for modern autonomous driving AI. However, due to stringent data collection requirements, existing datasets focus primarily on ordinary traffic scenarios, constraining the benefits of cooperative perception. To address this challenge, we introduce CATS-V2V, the first-of-its-kind real-world dataset for V2V cooperative perception under complex adverse traffic scenarios. The dataset was collected by two hardware time-synchronized vehicles, covering 10 weather and lighting conditions across 10 diverse locations. The 100-clip dataset includes 60K frames of 10 Hz LiDAR point clouds and 1.26M multi-view 30 Hz camera images, along with 750K anonymized yet high-precision RTK-fixed GNSS and IMU records. Correspondingly, we provide time-consistent 3D bounding box annotations for objects, as well as static scenes to construct a 4D BEV representation. On this basis, we propose a target-based temporal alignment method, ensuring that all objects are precisely aligned across all sensor modalities. We hope that CATS-V2V, the largest-scale, most supportive, and highest-quality dataset of its kind to date, will benefit the autonomous driving community in related tasks.