SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing occupancy world models predominantly rely on static grids and fixed embeddings, limiting their adaptability to the dynamic and continuous nature of real-world scenes and thus impairing perceptual flexibility. To address this, we propose a 4D semantic occupancy world model based on sparse dynamic queries: it employs learnable spatiotemporal queries for adaptive scene modeling; integrates range-adaptive perception and state-conditioned prediction modules to regressively align continuous spatiotemporal evolution; and incorporates vehicle-state modulation with temporal-aware self-scheduling training. Evaluated on benchmarks including nuScenes, our model achieves state-of-the-art performance across perception, forecasting, and planning tasks. Ablation studies and visualizations confirm its enhanced flexibility, dynamic adaptability, and computational efficiency.

Technology Category

Application Category

📝 Abstract
Semantic occupancy has emerged as a powerful representation in world models for its ability to capture rich spatial semantics. However, most existing occupancy world models rely on static and fixed embeddings or grids, which inherently limit the flexibility of perception. Moreover, their ``in-place classification" over grids exhibits a potential misalignment with the dynamic and continuous nature of real scenarios.In this paper, we propose SparseWorld, a novel 4D occupancy world model that is flexible, adaptive, and efficient, powered by sparse and dynamic queries. We propose a Range-Adaptive Perception module, in which learnable queries are modulated by the ego vehicle states and enriched with temporal-spatial associations to enable extended-range perception. To effectively capture the dynamics of the scene, we design a State-Conditioned Forecasting module, which replaces classification-based forecasting with regression-guided formulation, precisely aligning the dynamic queries with the continuity of the 4D environment. In addition, We specifically devise a Temporal-Aware Self-Scheduling training strategy to enable smooth and efficient training. Extensive experiments demonstrate that SparseWorld achieves state-of-the-art performance across perception, forecasting, and planning tasks. Comprehensive visualizations and ablation studies further validate the advantages of SparseWorld in terms of flexibility, adaptability, and efficiency. The code is available at https://github.com/MSunDYY/SparseWorld.
Problem

Research questions and friction points this paper is trying to address.

Overcoming static embedding limitations in semantic occupancy world models
Aligning dynamic queries with continuous 4D environmental representations
Enhancing perception and forecasting through adaptive sparse modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

SparseWorld uses sparse dynamic queries for 4D occupancy modeling
Range-Adaptive Perception module enables extended-range scene understanding
State-Conditioned Forecasting replaces classification with regression formulation
🔎 Similar Papers
No similar papers found.
Chenxu Dang
Chenxu Dang
Huazhong University of Science and Technology
Computer VisionAutonomous Driving
H
Haiyan Liu
Lenovo
G
Guangjun Bao
Lenovo
P
Pei An
Huazhong University of Science and Technology
X
Xinyue Tang
Lenovo
J
Jie Ma
Huazhong University of Science and Technology
B
Bingchuan Sun
Lenovo
Y
Yan Wang
Institute for AI Industry Research (AIR), Tsinghua University