🤖 AI Summary
Existing end-to-end autonomous driving methods suffer from low closed-loop success rates and poor interpretability, hindering real-world deployment. This paper proposes the first multimodal large language model (MLLM) framework tailored for closed-loop autonomous driving, innovatively integrating structured chain-of-thought (CoT) reasoning with autoregressive action modeling to unify perception, decision-making, and control. The framework jointly encodes visual and linguistic inputs, performs CoT-based sequential reasoning, and is tightly integrated with the CARLA simulator. It achieves state-of-the-art closed-loop success rates on benchmarks including Bench2Drive. Key contributions include: (i) the first incorporation of CoT into closed-loop autonomous driving, substantially enhancing decision interpretability and system robustness; and (ii) leveraging an MLLM to generate human-understandable driving logic and predict actions in a semantically coherent manner. Experimental results demonstrate both improved performance and transparent, traceable driving behavior.
📝 Abstract
End-to-end autonomous driving has advanced significantly, offering benefits such as system simplicity and stronger driving performance in both open-loop and closed-loop settings than conventional pipelines. However, existing frameworks still suffer from low success rates in closed-loop evaluations, highlighting their limitations in real-world deployment. In this paper, we introduce X-Driver, a unified multi-modal large language models(MLLMs) framework designed for closed-loop autonomous driving, leveraging Chain-of-Thought(CoT) and autoregressive modeling to enhance perception and decision-making. We validate X-Driver across multiple autonomous driving tasks using public benchmarks in CARLA simulation environment, including Bench2Drive[6]. Our experimental results demonstrate superior closed-loop performance, surpassing the current state-of-the-art(SOTA) while improving the interpretability of driving decisions. These findings underscore the importance of structured reasoning in end-to-end driving and establish X-Driver as a strong baseline for future research in closed-loop autonomous driving.