🤖 AI Summary
This work addresses joint motion prediction and planning for autonomous driving, investigating empirical scaling laws of encoder-decoder autoregressive Transformer models. Method: Leveraging 500,000 hours of real-world driving data, we systematically analyze how model scale affects closed-loop performance and training loss. We identify an optimal parameter-to-data scaling ratio of 1.5:1 during training and propose an inference-time strategy combining output sampling with clustering to enhance small-model performance. Contribution/Results: We establish the first systematic scaling law for this domain, demonstrating that closed-loop metrics improve as a power law with model size and that training loss strongly correlates with closed-loop performance. Crucially, we show that generalizable trajectory data from surrounding vehicles improves ego-vehicle planning capability. These findings provide quantitative guidance for efficient large-model training and lightweight deployment in autonomous driving systems.
📝 Abstract
We study the empirical scaling laws of a family of encoder-decoder autoregressive transformer models on the task of joint motion forecasting and planning in the autonomous driving domain. Using a 500 thousand hours driving dataset, we demonstrate that, similar to language modeling, model performance improves as a power-law function of the total compute budget, and we observe a strong correlation between model training loss and model evaluation metrics. Most interestingly, closed-loop metrics also improve with scaling, which has important implications for the suitability of open-loop metrics for model development and hill climbing. We also study the optimal scaling of the number of transformer parameters and the training data size for a training compute-optimal model. We find that as the training compute budget grows, optimal scaling requires increasing the model size 1.5x as fast as the dataset size. We also study inference-time compute scaling, where we observe that sampling and clustering the output of smaller models makes them competitive with larger models, up to a crossover point beyond which a larger models becomes more inference-compute efficient. Overall, our experimental results demonstrate that optimizing the training and inference-time scaling properties of motion forecasting and planning models is a key lever for improving their performance to address a wide variety of driving scenarios. Finally, we briefly study the utility of training on general logged driving data of other agents to improve the performance of the ego-agent, an important research area to address the scarcity of robotics data for large capacity models training.