🤖 AI Summary
To address distribution shift and causal confusion in vision-language-action (VLA) models for autonomous driving—stemming from overreliance on imitation learning—MindDrive introduces the first online reinforcement learning (RL)-based VLA framework. It abandons inefficient continuous trajectory optimization and instead performs trial-and-error decision-making over a discretized language-action space. Its key contributions are: (1) the first online RL-VLA framework tailored for autonomous driving; (2) a dual-LoRA fine-tuned large language model architecture that decouples high-level planning from action grounding; and (3) a trajectory-level reward mapping mechanism that aligns RL optimization directly with natural language driving policies in an end-to-end manner. Evaluated on the Bench2Drive benchmark, MindDrive achieves 78.04 driving score and 55.09% task success rate—the state-of-the-art performance among online RL-driven VLA models.
📝 Abstract
Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely on Imitation Learning (IL), which introduces inherent challenges such as distribution shift and causal confusion. Online Reinforcement Learning offers a promising pathway to address these issues through trial-and-error learning. However, applying online reinforcement learning to VLA models in autonomous driving is hindered by inefficient exploration in continuous action spaces. To overcome this limitation, we propose MindDrive, a VLA framework comprising a large language model (LLM) with two distinct sets of LoRA parameters. The one LLM serves as a Decision Expert for scenario reasoning and driving decision-making, while the other acts as an Action Expert that dynamically maps linguistic decisions into feasible trajectories. By feeding trajectory-level rewards back into the reasoning space, MindDrive enables trial-and-error learning over a finite set of discrete linguistic driving decisions, instead of operating directly in a continuous action space. This approach effectively balances optimal decision-making in complex scenarios, human-like driving behavior, and efficient exploration in online reinforcement learning. MindDrive achieves strong closed-loop performance on the challenging Bench2Drive benchmark, with a Driving Score (DS) of 78.04 and a Success Rate (SR) of 55.09%. To the best of our knowledge, this is the first work to demonstrate the effectiveness of online reinforcement learning for the VLA model in autonomous driving.