GEM: Gaussian Evolution Model for Occupancy Forecasting and Motion Planning

📅 2026-05-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

222K/year
🤖 AI Summary
Existing autonomous driving approaches face challenges in 3D semantic occupancy prediction and motion planning due to temporal discretization, error accumulation, and difficulties in modeling continuous dynamics. This work proposes GEM, a non-autoregressive world model based on explicit continuous 4D Gaussian primitives, which introduces continuous-time Gaussian evolution into occupancy prediction for the first time. By learning the dynamics of Gaussian primitives, GEM enables flexible querying at arbitrary timestamps, supports interpretable modeling of scene evolution, and facilitates end-to-end motion planning. The method achieves state-of-the-art performance in semantic occupancy prediction while offering high temporal flexibility and strong planning capabilities.
📝 Abstract
Future 3D semantic occupancy forecasting and motion planning are central to autonomous driving, as they require models to reason about how surrounding scenes evolve and how the ego vehicle should act. Existing occupancy world models commonly discretize scenes into latent embeddings, volumetric features, or quantized tokens, and forecast future states through fixed-step autoregressive generation. This limits temporal flexibility, obscures scene evolution, accumulates errors over long horizons, and poorly matches the continuous-time dynamics of real driving scenes. We propose GEM, a Gaussian Evolution Model for non-autoregressive occupancy world modeling, where driving scenes are represented as explicit continuous 4D Gaussian primitives with learned dynamics. Instead of rolling out future occupancy states step by step, GEM directly queries the Gaussian world representation at arbitrary timestamps and splats the corresponding conditional 3D Gaussians into semantic occupancy volumes. This enables efficient forecasting over the full horizon while retaining a compact and interpretable scene representation. By decoupling spatial geometry, temporal support, and primitive motion, GEM makes the predicted world easier to inspect, as each primitive's evolution can be followed continuously over time. The same representation also supports motion planning by predicting future ego trajectories from the learned Gaussian world. Extensive experiments show that GEM achieves state-of-the-art future semantic occupancy forecasting and strong motion planning performance, while providing flexible temporal querying.
Problem

Research questions and friction points this paper is trying to address.

occupancy forecasting
motion planning
autonomous driving
scene evolution
continuous-time dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian primitives
non-autoregressive forecasting
4D scene representation
semantic occupancy
motion planning
🔎 Similar Papers
2023-10-26International Conference Robotics and Automation EngineeringCitations: 6