🤖 AI Summary
Current autonomous driving world models suffer from limitations including single-state modality, short video sequences, coarse-grained action control, and absence of explicit reward modeling—hindering joint modeling of state, action, and reward. This paper introduces the first unified world model for autonomous driving: it enables pixel-level trajectory control via a panoramic Plücker ray representation; constructs a regularized, dense, and differentiable reward function through generative 3D occupancy prediction; and performs multimodal joint modeling over RGB, semantic, depth, and 3D occupancy inputs. The framework supports long-horizon autoregressive generation and closed-loop navigation evaluation. Experiments demonstrate state-of-the-art performance in video fidelity, action accuracy, and long-term stability, while significantly improving simulation capabilities for driving compliance and safety.
📝 Abstract
Autonomous driving world models are expected to work effectively across three core dimensions: state, action, and reward. Existing models, however, are typically restricted to limited state modalities, short video sequences, imprecise action control, and a lack of reward awareness. In this paper, we introduce OmniNWM, an omniscient panoramic navigation world model that addresses all three dimensions within a unified framework. For state, OmniNWM jointly generates panoramic videos of RGB, semantics, metric depth, and 3D occupancy. A flexible forcing strategy enables high-quality long-horizon auto-regressive generation. For action, we introduce a normalized panoramic Plucker ray-map representation that encodes input trajectories into pixel-level signals, enabling highly precise and generalizable control over panoramic video generation. Regarding reward, we move beyond learning reward functions with external image-based models: instead, we leverage the generated 3D occupancy to directly define rule-based dense rewards for driving compliance and safety. Extensive experiments demonstrate that OmniNWM achieves state-of-the-art performance in video generation, control accuracy, and long-horizon stability, while providing a reliable closed-loop evaluation framework through occupancy-grounded rewards. Project page is available at https://github.com/Arlo0o/OmniNWM.