🤖 AI Summary
This work addresses the trade-off between accuracy and coverage in existing end-to-end autonomous driving planning methods that rely on either static trajectory vocabularies or dynamically generated proposals. The authors propose a novel end-to-end planning framework featuring a decomposed trajectory vocabulary—decoupling path and speed—and a two-stage scoring mechanism comprising coarse filtering followed by fine-grained evaluation. Without resorting to dynamic proposal generation, the method achieves substantial performance gains by densely sampling anchor trajectories to ensure comprehensive coverage of the static vocabulary, while leveraging a lightweight ResNet-34 backbone for efficient inference. Evaluated on NAVSIM, the approach attains 92.0 PDMS and 90.1 EPDMS; on Bench2Drive, it achieves a driving score of 89.15 and a success rate of 70.00%, demonstrating that a well-designed, dense static vocabulary can be highly competitive when properly structured and sufficiently populated.
📝 Abstract
End-to-end multi-modal planning has been widely adopted to model the uncertainty of driving behavior, typically by scoring candidate trajectories and selecting the optimal one. Existing approaches generally fall into two categories: scoring a large static trajectory vocabulary, or scoring a small set of dynamically generated proposals. While static vocabularies often suffer from coarse discretization of the action space, dynamic proposals provide finer-grained precision and have shown stronger empirical performance on existing benchmarks. However, it remains unclear whether dynamic generation is fundamentally necessary, or whether static vocabularies can already achieve comparable performance when they are sufficiently dense to cover the action space. In this work, we start with a systematic scaling study of Hydra-MDP, a representative scoring-based method, revealing that performance consistently improves as trajectory anchors become denser, without exhibiting saturation before computational constraints are reached. Motivated by this observation, we propose SparseDriveV2 to push the performance boundary of scoring-based planning through two complementary innovations: (1) a scalable vocabulary representation with a factorized structure that decomposes trajectories into geometric paths and velocity profiles, enabling combinatorial coverage of the action space, and (2) a scalable scoring strategy with coarse factorized scoring over paths and velocity profiles followed by fine-grained scoring on a small set of composed trajectories. By combining these two techniques, SparseDriveV2 achieves 92.0 PDMS and 90.1 EPDMS on NAVSIM, with 89.15 Driving Score and 70.00 Success Rate on Bench2Drive with a lightweight ResNet-34 as backbone. Code and model are released at https://github.com/swc-17/SparseDriveV2.