OmniNWM: Omniscient Driving Navigation World Models

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

Current autonomous driving world models suffer from limitations including single-state modality, short video sequences, coarse-grained action control, and absence of explicit reward modeling—hindering joint modeling of state, action, and reward. This paper introduces the first unified world model for autonomous driving: it enables pixel-level trajectory control via a panoramic Plücker ray representation; constructs a regularized, dense, and differentiable reward function through generative 3D occupancy prediction; and performs multimodal joint modeling over RGB, semantic, depth, and 3D occupancy inputs. The framework supports long-horizon autoregressive generation and closed-loop navigation evaluation. Experiments demonstrate state-of-the-art performance in video fidelity, action accuracy, and long-term stability, while significantly improving simulation capabilities for driving compliance and safety.

Technology Category

Application Category

📝 Abstract

Autonomous driving world models are expected to work effectively across three core dimensions: state, action, and reward. Existing models, however, are typically restricted to limited state modalities, short video sequences, imprecise action control, and a lack of reward awareness. In this paper, we introduce OmniNWM, an omniscient panoramic navigation world model that addresses all three dimensions within a unified framework. For state, OmniNWM jointly generates panoramic videos of RGB, semantics, metric depth, and 3D occupancy. A flexible forcing strategy enables high-quality long-horizon auto-regressive generation. For action, we introduce a normalized panoramic Plucker ray-map representation that encodes input trajectories into pixel-level signals, enabling highly precise and generalizable control over panoramic video generation. Regarding reward, we move beyond learning reward functions with external image-based models: instead, we leverage the generated 3D occupancy to directly define rule-based dense rewards for driving compliance and safety. Extensive experiments demonstrate that OmniNWM achieves state-of-the-art performance in video generation, control accuracy, and long-horizon stability, while providing a reliable closed-loop evaluation framework through occupancy-grounded rewards. Project page is available at https://github.com/Arlo0o/OmniNWM.

Problem

Research questions and friction points this paper is trying to address.

Generating panoramic videos with RGB, semantics, depth and 3D occupancy

Enabling precise control over panoramic video generation using trajectory encoding

Defining rule-based dense rewards for driving compliance and safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates panoramic videos with multiple state modalities

Encodes trajectories into pixel-level signals for precise control

Defines dense rewards using generated 3D occupancy

🔎 Similar Papers

MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations