🤖 AI Summary
Existing BEV perception datasets suffer from limited scale, narrow scene diversity, and high annotation costs, hindering robust multi-sensor fusion and multi-task learning. To address this, we propose SimBEV: the first configurable synthetic framework supporting multi-source ground-truth alignment, full-sensor physical modeling, and unified BEV-space annotation. SimBEV achieves high-fidelity, controllable, and reproducible large-scale BEV data generation via cross-modal synchronized rendering, parametric urban scene synthesis, and joint multi-task labeling—including BEV semantic segmentation and 3D object detection. The released SimBEV dataset significantly improves model generalization on long-tail scenarios and enhances training efficiency, while enabling zero-shot domain transfer.
📝 Abstract
Bird's-eye view (BEV) perception for autonomous driving has garnered significant attention in recent years, in part because BEV representation facilitates the fusion of multi-sensor data. This enables a variety of perception tasks including BEV segmentation, a concise view of the environment that can be used to plan a vehicle's trajectory. However, this representation is not fully supported by existing datasets, and creation of new datasets can be a time-consuming endeavor. To address this problem, in this paper we introduce SimBEV, an extensively configurable and scalable randomized synthetic data generation tool that incorporates information from multiple sources to capture accurate BEV ground truth data, supports a comprehensive array of sensors, and enables a variety of perception tasks including BEV segmentation and 3D object detection. We use SimBEV to create the SimBEV dataset, a large collection of annotated perception data from diverse driving scenarios.