🤖 AI Summary
This work investigates the applicability of model scaling paradigms to 3D perception in autonomous driving, addressing the challenges of multimodal heterogeneous sensor fusion and complex scene understanding. To this end, the authors propose STELLAR, a scalable architecture based on sparse window Transformers that unifies LiDAR, radar, camera inputs, and high-definition map priors within a single framework. The model is trained at scale via distributed training on 50 million driving samples. The study provides the first systematic empirical analysis of the joint effects of model size, data volume, and computational resources on perception performance, revealing clear scaling laws. Evaluated on the Waymo Open Dataset benchmark, STELLAR significantly outperforms existing approaches, establishing a new state of the art and demonstrating the efficacy of large-scale training for advancing autonomous driving perception.
📝 Abstract
Model scaling has demonstrated remarkable success through large-scale training on diverse datasets. It remains an open question whether the same paradigm would apply to autonomous driving perception systems due to unique challenges, such as fusing heterogeneous sensor data and the need for sophisticated 3D spatial understanding. To bridge this gap, we present a comprehensive study on systematically analyzing the impact of scale on these systems. We develop our STELLAR model based on Sparse Window Transformer, by extending the input modalities to include LiDAR, radar, camera, and map prior. We train the model on a large-scale dataset of 50 million driving examples with up to 500 million parameters. Our large-scale experiments reveal empirical scaling trends that connect model performance to model size, data, and compute. The resulting model establishes a new state-of-the-art on the Waymo Open Dataset challenge, outperforming prior arts by a large margin. Our work demonstrates that large-scale training is a highly promising path for advancing the capabilities of perception models for autonomous driving.