X-Driver: Explainable Autonomous Driving with Vision-Language Models

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Existing end-to-end autonomous driving methods suffer from low closed-loop success rates and poor interpretability, hindering real-world deployment. This paper proposes the first multimodal large language model (MLLM) framework tailored for closed-loop autonomous driving, innovatively integrating structured chain-of-thought (CoT) reasoning with autoregressive action modeling to unify perception, decision-making, and control. The framework jointly encodes visual and linguistic inputs, performs CoT-based sequential reasoning, and is tightly integrated with the CARLA simulator. It achieves state-of-the-art closed-loop success rates on benchmarks including Bench2Drive. Key contributions include: (i) the first incorporation of CoT into closed-loop autonomous driving, substantially enhancing decision interpretability and system robustness; and (ii) leveraging an MLLM to generate human-understandable driving logic and predict actions in a semantically coherent manner. Experimental results demonstrate both improved performance and transparent, traceable driving behavior.

Technology Category

Application Category

📝 Abstract

End-to-end autonomous driving has advanced significantly, offering benefits such as system simplicity and stronger driving performance in both open-loop and closed-loop settings than conventional pipelines. However, existing frameworks still suffer from low success rates in closed-loop evaluations, highlighting their limitations in real-world deployment. In this paper, we introduce X-Driver, a unified multi-modal large language models(MLLMs) framework designed for closed-loop autonomous driving, leveraging Chain-of-Thought(CoT) and autoregressive modeling to enhance perception and decision-making. We validate X-Driver across multiple autonomous driving tasks using public benchmarks in CARLA simulation environment, including Bench2Drive[6]. Our experimental results demonstrate superior closed-loop performance, surpassing the current state-of-the-art(SOTA) while improving the interpretability of driving decisions. These findings underscore the importance of structured reasoning in end-to-end driving and establish X-Driver as a strong baseline for future research in closed-loop autonomous driving.

Problem

Research questions and friction points this paper is trying to address.

Improving closed-loop autonomous driving success rates

Enhancing perception and decision-making with MLLMs

Increasing interpretability of autonomous driving decisions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified multi-modal large language models framework

Leverages Chain-of-Thought and autoregressive modeling

Enhances perception and decision-making in driving

🔎 Similar Papers

Safety Implications of Explainable Artificial Intelligence in End-to-End Autonomous Driving