Preliminary Investigation into Data Scaling Laws for Imitation Learning-Based End-to-End Autonomous Driving

📅 2024-12-03

🏛️ arXiv.org

📈 Citations: 10

✨ Influential: 0

career value

264K/year

🤖 AI Summary

End-to-end autonomous driving systems are constrained by the limited scale of real-world driving data, hindering systematic investigation of performance scaling laws. Method: Leveraging 4 million high-quality expert demonstrations—spanning 23 diverse driving scenarios and totaling over 30,000 hours—we conduct the first empirical study on data scaling in end-to-end driving. We evaluate model generalization via open-loop/closed-loop simulation and 1,400 challenging real-world test trajectories. Contribution/Results: We establish a significant power-law relationship between model performance and dataset size. Critically, marginal increases in long-tail scenario data yield substantial performance gains; moderate dataset expansion robustly improves cross-scenario and compositional action generalization. Generalization capability consistently improves with data scale across all evaluation protocols. These findings provide critical empirical grounding and a scalable, data-driven paradigm for optimizing end-to-end driving models and enabling their safe deployment on public roads.

Technology Category

Application Category

📝 Abstract

The end-to-end autonomous driving paradigm has recently attracted lots of attention due to its scalability. However, existing methods are constrained by the limited scale of real-world data, which hinders a comprehensive exploration of the scaling laws associated with end-to-end autonomous driving. To address this issue, we collected substantial data from various driving scenarios and behaviors and conducted an extensive study on the scaling laws of existing imitation learning-based end-to-end autonomous driving paradigms. Specifically, approximately 4 million demonstrations from 23 different scenario types were gathered, amounting to over 30,000 hours of driving demonstrations. We performed open-loop evaluations and closed-loop simulation evaluations in 1,400 diverse driving demonstrations (1,300 for open-loop and 100 for closed-loop) under stringent assessment conditions. Through experimental analysis, we discovered that (1) the performance of the driving model exhibits a power-law relationship with the amount of training data; (2) a small increase in the quantity of long-tailed data can significantly improve the performance for the corresponding scenarios; (3) appropriate scaling of data enables the model to achieve combinatorial generalization in novel scenes and actions. Our results highlight the critical role of data scaling in improving the generalizability of models across diverse autonomous driving scenarios, assuring safe deployment in the real world. Project repository: https://github.com/ucaszyp/Driving-Scaling-Law

Problem

Research questions and friction points this paper is trying to address.

Investigates data scaling laws for imitation learning in autonomous driving

Explores how data distribution affects model performance in driving scenarios

Analyzes combinatorial generalization through appropriate data scaling techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Collected 4 million demonstrations from diverse scenarios

Analyzed scaling laws through open and closed-loop evaluations

Focused on data distribution over volume for generalization

🔎 Similar Papers

SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control