Reason--Imagine--Act: Closed-Loop LLM Decision Making with World Models for Autonomous Driving

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Existing semantic-driven driving decision-making methods struggle to simultaneously achieve accurate intention understanding and physical feasibility in dynamic traffic, often resulting in unsafe behaviors. This work proposes RIA, a closed-loop decision framework that, for the first time in autonomous driving, enables real-time coupling of large language models with action-conditional world models. RIA generates candidate actions through online short-horizon rollouts and selects among them using a safety-scoring mechanism, thereby guaranteeing the safety and executability of each decision step. Evaluated on CARLA point-to-point navigation tasks, RIA achieves a route completion rate of 80.05%, a success rate of 51.10%, and an exceptionally low collision rate of only 0.20%, significantly outperforming training-free baselines and demonstrating its effectiveness and state-of-the-art performance.

📝 Abstract

Large language models (LLMs) are promising for autonomous driving, but semantics-only decision policies can yield physically unsafe behavior in dynamic traffic. Existing methods either perform online language reasoning without explicit dynamics verification or use world models mainly in offline pipelines, leaving a gap between semantic intent and physical feasibility at decision time. We propose Reason--Imagine--Act (RIA), a closed-loop framework that couples an LLM reasoner with an action-conditioned world model for online safety verification. At each step, the LLM proposes an action template and candidate sub-actions, the world model performs short-horizon rollouts, and a safety scorer selects the safest executable action with feedback to the next reasoning step. Under a unified CARLA point-goal protocol (1000 episodes), RIA achieves 80.05% route completion, 51.10% arrival rate, and 0.20% collision rate. Under the same closed-loop interface, RIA consistently outperforms training-free baselines, including CARLA TM and MADA, on core closed-loop metrics. For reproducibility, code is available at https://github.com/pku-smart-city/source_code/tree/main/RIA.

Problem

Research questions and friction points this paper is trying to address.

autonomous driving

large language models

world models

closed-loop decision making

physical safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

closed-loop decision making

world models

large language models