ReSim: Reliable World Simulation for Autonomous Driving

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the limitations in evaluating autonomous driving policies using real-world driving data—particularly the scarcity of rare hazardous or non-expert behaviors—this paper proposes a Controllable Driving World Model. It integrates expert trajectories with diverse, synthetically generated non-expert behaviors via simulation to enable high-fidelity, highly controllable open-scenario future prediction. Methodologically, we introduce the first heterogeneous-data-driven training paradigm and design a Video2Reward module that enables end-to-end differentiable mapping from generated video sequences to reward signals. Our architecture employs a diffusion-based Transformer, multi-source conditional fusion, CARLA-based data augmentation, and a dedicated reward estimation network. Experiments demonstrate a 44% improvement in video visual fidelity, over 50% gains in controllability of both expert and non-expert actions, a 2% boost in NAVSIM planning performance, and a 25% increase in policy selection accuracy.

Technology Category

Application Category

📝 Abstract

How can we reliably simulate future driving scenarios under a wide range of ego driving behaviors? Recent driving world models, developed exclusively on real-world driving data composed mainly of safe expert trajectories, struggle to follow hazardous or non-expert behaviors, which are rare in such data. This limitation restricts their applicability to tasks such as policy evaluation. In this work, we address this challenge by enriching real-world human demonstrations with diverse non-expert data collected from a driving simulator (e.g., CARLA), and building a controllable world model trained on this heterogeneous corpus. Starting with a video generator featuring a diffusion transformer architecture, we devise several strategies to effectively integrate conditioning signals and improve prediction controllability and fidelity. The resulting model, ReSim, enables Reliable Simulation of diverse open-world driving scenarios under various actions, including hazardous non-expert ones. To close the gap between high-fidelity simulation and applications that require reward signals to judge different actions, we introduce a Video2Reward module that estimates a reward from ReSim's simulated future. Our ReSim paradigm achieves up to 44% higher visual fidelity, improves controllability for both expert and non-expert actions by over 50%, and boosts planning and policy selection performance on NAVSIM by 2% and 25%, respectively.

Problem

Research questions and friction points this paper is trying to address.

Simulating diverse driving scenarios under varied behaviors

Improving controllability for expert and non-expert driving actions

Bridging high-fidelity simulation with reward signal applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enriches real-world data with simulator-collected non-expert data

Uses diffusion transformer for controllable video generation

Integrates Video2Reward module for action evaluation

🔎 Similar Papers

No similar papers found.