Scaling Laws of Motion Forecasting and Planning -- A Technical Report

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses joint motion prediction and planning for autonomous driving, investigating empirical scaling laws of encoder-decoder autoregressive Transformer models. Method: Leveraging 500,000 hours of real-world driving data, we systematically analyze how model scale affects closed-loop performance and training loss. We identify an optimal parameter-to-data scaling ratio of 1.5:1 during training and propose an inference-time strategy combining output sampling with clustering to enhance small-model performance. Contribution/Results: We establish the first systematic scaling law for this domain, demonstrating that closed-loop metrics improve as a power law with model size and that training loss strongly correlates with closed-loop performance. Crucially, we show that generalizable trajectory data from surrounding vehicles improves ego-vehicle planning capability. These findings provide quantitative guidance for efficient large-model training and lightweight deployment in autonomous driving systems.

Technology Category

Application Category

📝 Abstract

We study the empirical scaling laws of a family of encoder-decoder autoregressive transformer models on the task of joint motion forecasting and planning in the autonomous driving domain. Using a 500 thousand hours driving dataset, we demonstrate that, similar to language modeling, model performance improves as a power-law function of the total compute budget, and we observe a strong correlation between model training loss and model evaluation metrics. Most interestingly, closed-loop metrics also improve with scaling, which has important implications for the suitability of open-loop metrics for model development and hill climbing. We also study the optimal scaling of the number of transformer parameters and the training data size for a training compute-optimal model. We find that as the training compute budget grows, optimal scaling requires increasing the model size 1.5x as fast as the dataset size. We also study inference-time compute scaling, where we observe that sampling and clustering the output of smaller models makes them competitive with larger models, up to a crossover point beyond which a larger models becomes more inference-compute efficient. Overall, our experimental results demonstrate that optimizing the training and inference-time scaling properties of motion forecasting and planning models is a key lever for improving their performance to address a wide variety of driving scenarios. Finally, we briefly study the utility of training on general logged driving data of other agents to improve the performance of the ego-agent, an important research area to address the scarcity of robotics data for large capacity models training.

Problem

Research questions and friction points this paper is trying to address.

Study scaling laws for motion forecasting and planning in autonomous driving

Optimize model size and data size for compute-optimal training

Evaluate inference-time compute scaling and model performance trade-offs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoregressive transformers for motion forecasting

Power-law scaling with compute budget

Optimal model-data size scaling ratio

🔎 Similar Papers

EqDrive: Efficient Equivariant Motion Forecasting with Multi-Modality for Autonomous Driving