How Real is CARLAs Dynamic Vision Sensor? A Study on the Sim-to-Real Gap in Traffic Object Detection

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This study systematically quantifies the sim-to-real gap in CARLA’s Dynamic Vision Sensor (DVS) module—specifically its fidelity in event camera modeling—and its impact on traffic object detection performance. Method: We propose an evaluation paradigm wherein a Recurrent Vision Transformer (RVT) is trained exclusively on synthetic event data generated by CARLA’s DVS, and then evaluated for cross-domain generalization on both real-world and synthetic-mixed event streams. Contribution/Results: We present the first quantitative evidence of significant simulation distortion in CARLA’s DVS: models trained solely on synthetic events suffer over 40% mAP degradation on real event data, whereas models trained on real data exhibit strong cross-domain robustness. These findings identify insufficient DVS simulation fidelity as the primary bottleneck limiting event-based perception performance. Consequently, improving simulation accuracy and developing event-camera-specific domain adaptation methods are critically needed.

Technology Category

Application Category

📝 Abstract

Event cameras are gaining traction in traffic monitoring applications due to their low latency, high temporal resolution, and energy efficiency, which makes them well-suited for real-time object detection at traffic intersections. However, the development of robust event-based detection models is hindered by the limited availability of annotated real-world datasets. To address this, several simulation tools have been developed to generate synthetic event data. Among these, the CARLA driving simulator includes a built-in dynamic vision sensor (DVS) module that emulates event camera output. Despite its potential, the sim-to-real gap for event-based object detection remains insufficiently studied. In this work, we present a systematic evaluation of this gap by training a recurrent vision transformer model exclusively on synthetic data generated using CARLAs DVS and testing it on varying combinations of synthetic and real-world event streams. Our experiments show that models trained solely on synthetic data perform well on synthetic-heavy test sets but suffer significant performance degradation as the proportion of real-world data increases. In contrast, models trained on real-world data demonstrate stronger generalization across domains. This study offers the first quantifiable analysis of the sim-to-real gap in event-based object detection using CARLAs DVS. Our findings highlight limitations in current DVS simulation fidelity and underscore the need for improved domain adaptation techniques in neuromorphic vision for traffic monitoring.

Problem

Research questions and friction points this paper is trying to address.

Evaluates sim-to-real gap in event-based traffic object detection

Assesses CARLA DVS synthetic data vs real-world performance

Highlights need for better domain adaptation in neuromorphic vision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses CARLA simulator for synthetic event data

Trains recurrent vision transformer on synthetic data

Evaluates sim-to-real gap in event detection

🔎 Similar Papers

No similar papers found.