🤖 AI Summary
Current autonomous driving synthetic datasets suffer from limited scale, low photorealism, single-modality representation, and poor task adaptability. To address these limitations, we introduce the first full-stack, high-fidelity, multimodal (RGB + LiDAR) synthetic dataset specifically designed for autonomous driving. It provides fine-grained annotations across 64 object classes and supports multiple downstream tasks—including 2D object detection, bird’s-eye-view (BEV) semantic segmentation, and LiDAR point-wise semantic segmentation. Our methodology integrates a high-precision urban simulation engine, a physically based rendering pipeline, NeRF-enhanced texture synthesis for improved realism, and a programmable LiDAR simulator—ensuring spatiotemporal synchronization across sensors and geometric-semantic consistency. Extensive evaluations demonstrate state-of-the-art performance on multiple benchmarks, surpassing existing synthetic baselines. The dataset is publicly released and has been widely adopted by the research community.
📝 Abstract
As perception models continue to develop, the need for large-scale datasets increases. However, data annotation remains far too expensive to effectively scale and meet the demand. Synthetic datasets provide a solution to boost model performance with substantially reduced costs. However, current synthetic datasets remain limited in their scope, realism, and are designed for specific tasks and applications. In this work, we present RealDriveSim, a realistic multi-modal synthetic dataset for autonomous driving that not only supports popular 2D computer vision applications but also their LiDAR counterparts, providing fine-grained annotations for up to 64 classes. We extensively evaluate our dataset for a wide range of applications and domains, demonstrating state-of-the-art results compared to existing synthetic benchmarks. The dataset is publicly available at https://realdrivesim.github.io/.