MTDrive: Multi-turn Interactive Reinforcement Learning for Autonomous Driving

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the limitations of existing autonomous driving trajectory planning methods, which predominantly rely on single-step reasoning and struggle with complex, long-tail scenarios requiring iterative refinement. To overcome this, we propose MTDrive, a novel framework that introduces, for the first time, a multi-turn interactive reinforcement learning paradigm, enabling multimodal large language models to iteratively refine driving trajectories based on environmental feedback. Our key contributions include the multi-turn group relative policy optimization (mtGRPO) algorithm, the first interactive trajectory understanding dataset supporting multi-turn training, and a system-level training acceleration pipeline integrating high-resolution image transmission with closed-loop simulation. Evaluated on the NAVSIM benchmark, MTDrive substantially outperforms current approaches, achieving a 2.5× improvement in training throughput and demonstrating the efficacy of multi-turn reasoning in autonomous driving planning.

Technology Category

Application Category

📝 Abstract

Trajectory planning is a core task in autonomous driving, requiring the prediction of safe and comfortable paths across diverse scenarios. Integrating Multi-modal Large Language Models (MLLMs) with Reinforcement Learning (RL) has shown promise in addressing"long-tail"scenarios. However, existing methods are constrained to single-turn reasoning, limiting their ability to handle complex tasks requiring iterative refinement. To overcome this limitation, we present MTDrive, a multi-turn framework that enables MLLMs to iteratively refine trajectories based on environmental feedback. MTDrive introduces Multi-Turn Group Relative Policy Optimization (mtGRPO), which mitigates reward sparsity by computing relative advantages across turns. We further construct an interactive trajectory understanding dataset from closed-loop simulation to support multi-turn training. Experiments on the NAVSIM benchmark demonstrate superior performance compared to existing methods, validating the effectiveness of our multi-turn reasoning paradigm. Additionally, we implement system-level optimizations to reduce data transfer overhead caused by high-resolution images and multi-turn sequences, achieving 2.5x training throughput. Our data, models, and code will be made available soon.

Problem

Research questions and friction points this paper is trying to address.

autonomous driving

trajectory planning

multi-turn reasoning

long-tail scenarios

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-turn Reinforcement Learning

Trajectory Planning

mtGRPO