🤖 AI Summary
This work addresses the challenges posed by the unordered nature of LiDAR point clouds and the difficulty in distinguishing dynamic from static objects by proposing a generative LiDAR world model based on deformable Mamba. The method first designs a scene tokenizer tailored to LiDAR scanning characteristics to compress point cloud sequences into compact tokens. It then introduces an unsupervised module for disentangling dynamic and static features. Finally, a three-path deformable Mamba architecture is constructed, leveraging selective scanning and an adaptive gating fusion mechanism to accurately model spatiotemporal environmental dynamics. The resulting model supports autonomous rollout and “what-if” scenario generation, achieving significantly improved generation fidelity and imagination capabilities across multiple benchmarks, thereby advancing the state of LiDAR world modeling.
📝 Abstract
World models, which simulate environmental dynamics and generate sensor observations, are gaining increasing attention in autonomous driving. However, progress in LiDAR-based world models has lagged behind those built on camera videos or occupancy data, primarily due to two core challenges: the inherent disorder of LiDAR point clouds and the difficulty of distinguishing dynamic objects from static structures. To address these issues, we propose GEM: a Generative LiDAR world model that leverages deformable mamba architecture, significantly improving fidelity and imaginative capability. Specifically, leveraging the structural similarity between sequential laser scanning and Mamba's processing mechanism, we first tokenize LiDAR sweeps into compact representations via a custom LiDAR scene tokenizer. After unsupervised disentanglement of tokenized features via a dynamic-static separator, a tri-path deformable Mamba is introduced to perform selective scanning and adaptive gating fusion over the disentangled features, leading to enhanced spatial-temporal understanding of the world evolution. Optionally, a planner and a BEV layout controller can be integrated to explore the model's capability for autonomous rollout and its potential to generate ``what-if" scenarios. Extensive experiments show that GEM achieves state-of-the-art performances across diverse benchmarks and evaluation settings, demonstrating its superiority and effectiveness. Project page: https://github.com/wuyang98/GEM.