OpenREAD: Reinforced Open-Ended Reasoing for End-to-End Autonomous Driving with LLM-as-Critic

πŸ“… 2025-12-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current two-stage fine-tuning paradigms in autonomous driving face two key bottlenecks: (1) supervised fine-tuning exhibits weak generalization, limiting complex reasoning capabilities; and (2) reinforcement fine-tuning is hindered by the non-quantifiability of rewards for open-ended scene understanding tasks. To address these, we propose OpenREADβ€”a novel framework that pioneers end-to-end reinforcement learning for open-ended scene understanding in autonomous driving. It employs a large language model (Qwen3) as a learnable evaluator within a vision-language joint architecture, enabling full-stack optimization from high-level decision-making to low-level trajectory planning. By leveraging large-scale chain-of-thought annotations and LLM-based quantification of reasoning quality, OpenREAD unifies reward modeling across upstream perception and downstream planning tasks. Evaluated on multiple reasoning and planning benchmarks, OpenREAD achieves state-of-the-art performance, empirically validating that reinforcement-driven open-ended reasoning significantly enhances both generalization and overall system performance in autonomous driving.

Technology Category

Application Category

πŸ“ Abstract
Recently, two-stage fine-tuning strategies, e.g., acquiring essential driving knowledge through supervised fine-tuning (SFT) and further enhancing decision-making and planning via reinforcement fine-tuning (RFT), have shown strong potential in advancing the knowledge-driven autonomous driving (AD) paradigm. However, the learning nature of SFT still limits the generalization of reasoning, thereby constraining the full potential of driving performance. Meanwhile, current RFT approaches are primarily applied to downstream tasks, since scene understanding is an open-ended problem where corresponding rewards are difficult to quantify. To address these limitations, we propose OpenREAD, an OPEN-ended REasoning reinforced vision-language model (VLM)-based autonomous driving (AD) framework that enables end-to-end RFT across the full spectrum from high-level reasoning to low-level trajectory planning. Specifically, we begin by constructing large-scale Chain-of-Thought (CoT) annotations on open-source driving-related knowledge datasets, and employ the powerful Qwen3 large language model (LLM) as the critic in RFT to quantify reasoning quality for open-ended questions during reward modeling. Extensive experiments confirm that joint end-to-end RFT yields substantial improvements in both upstream and downstream tasks, enabling OpenREAD to achieve state-of-the-art performance on reasoning and planning benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Enhances open-ended reasoning in autonomous driving via reinforcement fine-tuning
Addresses reward quantification for scene understanding in open-ended problems
Improves generalization from high-level reasoning to low-level trajectory planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end reinforcement fine-tuning for autonomous driving
LLM-as-critic quantifies reasoning quality for open-ended questions
Chain-of-Thought annotations on driving datasets enhance reasoning
Songyan Zhang
Songyan Zhang
Nanyang Technology University
Computer VisionAutonomous Driving
W
Wenhui Huang
Harvard University, USA
Zhan Chen
Zhan Chen
Georgia Southern University
Mathematical modeling in biology and scientific computing
C
Chua Jiahao Collister
Nanyang Technological University, Singapore
Q
Qihang Huang
Nanyang Technological University, Singapore
C
Chen Lv
Nanyang Technological University, Singapore