Vision-Centric 4D Occupancy Forecasting and Planning via Implicit Residual World Models

📅 2025-10-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision-centric end-to-end autonomous driving suffers from redundant modeling of static backgrounds and weak prediction of dynamic scene evolution in world models. Method: We propose the Implicit Residual World Model (IR-WM), which integrates bird’s-eye-view (BEV) representation, implicit residual prediction, temporal prior modeling, and dynamic semantic alignment to jointly perform 4D occupancy forecasting and trajectory planning from monocular video inputs. Contributions/Results: IR-WM introduces (i) residual BEV temporal modeling—predicting only dynamic state changes rather than full scene reconstruction; (ii) a semantic alignment module to suppress temporal error accumulation; and (iii) joint optimization coupling between the world model and planner. Evaluated on nuScenes, IR-WM achieves state-of-the-art performance in both 4D occupancy prediction and trajectory planning, significantly improving long-horizon prediction stability and planning accuracy.

Technology Category

Application Category

📝 Abstract
End-to-end autonomous driving systems increasingly rely on vision-centric world models to understand and predict their environment. However, a common ineffectiveness in these models is the full reconstruction of future scenes, which expends significant capacity on redundantly modeling static backgrounds. To address this, we propose IR-WM, an Implicit Residual World Model that focuses on modeling the current state and evolution of the world. IR-WM first establishes a robust bird's-eye-view representation of the current state from the visual observation. It then leverages the BEV features from the previous timestep as a strong temporal prior and predicts only the "residual", i.e., the changes conditioned on the ego-vehicle's actions and scene context. To alleviate error accumulation over time, we further apply an alignment module to calibrate semantic and dynamic misalignments. Moreover, we investigate different forecasting-planning coupling schemes and demonstrate that the implicit future state generated by world models substantially improves planning accuracy. On the nuScenes benchmark, IR-WM achieves top performance in both 4D occupancy forecasting and trajectory planning.
Problem

Research questions and friction points this paper is trying to address.

Focuses on modeling dynamic changes in autonomous driving environments
Reduces redundant reconstruction of static backgrounds in world models
Improves planning accuracy through implicit future state generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit Residual World Model focuses on state changes
Uses BEV features as temporal prior for residuals
Alignment module corrects semantic and dynamic misalignments
🔎 Similar Papers
No similar papers found.