🤖 AI Summary
This work addresses the fragmentation among perception, prediction, and planning tasks in vision-based autonomous driving by proposing an efficient, unified Gaussian-centric pretraining framework. The approach employs a two-stage design: in the first stage, 3D semantic Gaussian representations are generated through self-supervised reconstruction from multi-view semantic and depth images; in the second stage, two novel latent world models—based on Gaussian flows and ego-vehicle planning guidance—are introduced for temporal modeling. This is the first method to integrate dual latent world models into Gaussian representations, enabling joint pretraining for 3D occupancy perception, 4D occupancy prediction, and motion planning. Experiments demonstrate that the proposed method significantly outperforms existing Gaussian-centric approaches on the SurroundOcc and nuScenes benchmarks.
📝 Abstract
Vision-based autonomous driving has gained much attention due to its low costs and excellent performance. Compared with dense BEV (Bird's Eye View) or sparse query models, Gaussian-centric method is a comprehensive yet sparse representation by describing scene with 3D semantic Gaussians. In this paper, we introduce DLWM, a novel paradigm with Dual Latent World Models specifically designed to enable holistic gaussian-centric pre-training in autonomous driving using two stages. In the first stage, DLWM predicts 3D Gaussians from queries by self-supervised reconstructing multi-view semantic and depth images. Equipped with fine-grained contextual features, in the second stage, two latent world models are trained separately for temporal feature learning, including Gaussian-flow-guided latent prediction for downstream occupancy perception and forecasting tasks, and ego-planning-guided latent prediction for motion planning. Extensive experiments in SurroundOcc and nuScenes benchmarks demonstrate that DLWM shows significant performance gains across Gaussian-centric 3D occupancy perception, 4D occupancy forecasting and motion planning tasks.