🤖 AI Summary
To address the bottleneck of autonomous driving’s reliance on large-scale, costly annotated datasets, this paper proposes the first diffusion model capable of scalably generating high-quality, freely annotatable driving data. Methodologically, it introduces a novel subject-control mechanism that integrates multi-source external data for precise content regulation; further, it combines a cross-modal subject encoder, controllable spatiotemporal conditioning, and multi-source alignment to enable joint scaling of data volume and diversity—demonstrating for the first time that synthetically generated data can meaningfully drive iterative model improvement. Experiments show that models trained solely on generated data achieve 78–92% of the performance attained using real-world data on BEV segmentation and 3D detection tasks. Moreover, doubling the synthetic dataset yields an average mAP gain of 4.3%, significantly outperforming existing baselines.
📝 Abstract
Autonomous driving progress relies on large-scale annotated datasets. In this work, we explore the potential of generative models to produce vast quantities of freely-labeled data for autonomous driving applications and present SubjectDrive, the first model proven to scale generative data production in a way that could continuously improve autonomous driving applications. We investigate the impact of scaling up the quantity of generative data on the performance of downstream perception models and find that enhancing data diversity plays a crucial role in effectively scaling generative data production. Therefore, we have developed a novel model equipped with a subject control mechanism, which allows the generative model to leverage diverse external data sources for producing varied and useful data. Extensive evaluations confirm SubjectDrive's efficacy in generating scalable autonomous driving training data, marking a significant step toward revolutionizing data production methods in this field.