SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control

📅 2024-03-28

🏛️ arXiv.org

📈 Citations: 6

✨ Influential: 0

career value

222K/year

🤖 AI Summary

To address the bottleneck of autonomous driving’s reliance on large-scale, costly annotated datasets, this paper proposes the first diffusion model capable of scalably generating high-quality, freely annotatable driving data. Methodologically, it introduces a novel subject-control mechanism that integrates multi-source external data for precise content regulation; further, it combines a cross-modal subject encoder, controllable spatiotemporal conditioning, and multi-source alignment to enable joint scaling of data volume and diversity—demonstrating for the first time that synthetically generated data can meaningfully drive iterative model improvement. Experiments show that models trained solely on generated data achieve 78–92% of the performance attained using real-world data on BEV segmentation and 3D detection tasks. Moreover, doubling the synthetic dataset yields an average mAP gain of 4.3%, significantly outperforming existing baselines.

Technology Category

Application Category

📝 Abstract

Autonomous driving progress relies on large-scale annotated datasets. In this work, we explore the potential of generative models to produce vast quantities of freely-labeled data for autonomous driving applications and present SubjectDrive, the first model proven to scale generative data production in a way that could continuously improve autonomous driving applications. We investigate the impact of scaling up the quantity of generative data on the performance of downstream perception models and find that enhancing data diversity plays a crucial role in effectively scaling generative data production. Therefore, we have developed a novel model equipped with a subject control mechanism, which allows the generative model to leverage diverse external data sources for producing varied and useful data. Extensive evaluations confirm SubjectDrive's efficacy in generating scalable autonomous driving training data, marking a significant step toward revolutionizing data production methods in this field.

Problem

Research questions and friction points this paper is trying to address.

Autonomous Vehicles

Diverse Road Conditions

Performance Enhancement

Innovation

Methods, ideas, or system contributions that make the work stand out.

SubjectDrive

Diverse Annotated Data

Autonomous Driving Training

🔎 Similar Papers

Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models