MTRDrive: Memory-Tool Synergistic Reasoning for Robust Autonomous Driving in Corner Cases

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current vision-language models (VLMs) exhibit hallucination and poor generalization under out-of-distribution (OOD) conditions—particularly in complex, dynamic driving scenarios—hindering their real-world deployment in end-to-end autonomous driving. To address this, we propose a memory–tool collaborative closed-loop reasoning framework: contextual awareness is enhanced via experience-based memory retrieval, while dynamic tool invocation enables proactive reasoning and decision-making for long-tail situations (e.g., road construction). We further introduce Roadwork-VLM—the first benchmark specifically designed to evaluate VLMs on construction-related driving scenarios. Our method achieves PDMS 88.3, high-level planning accuracy of 79.8%, and overall planning accuracy of 82.6% on NAVSIM; under zero-shot evaluation on Roadwork-VLM, it attains 80.2%, substantially improving robustness and generalization of VLMs in complex, dynamic environments.

Technology Category

Application Category

📝 Abstract
Vision-Language Models(VLMs) have demonstrated significant potential for end-to-end autonomous driving, yet a substantial gap remains between their current capabilities and the reliability necessary for real-world deployment. A critical challenge is their fragility, characterized by hallucinations and poor generalization in out-of-distribution (OOD) scenarios. To bridge this gap, we introduce MTRDrive, a novel framework that integrates procedural driving experiences with a dynamic toolkit to enhance generalization and proactive decision-making. MTRDrive addresses these limitations through a closed-loop system that combines a memory-based experience retrieval mechanism with dynamic toolkits. This synergy enables the model to interact more effectively with its environment, improving both reasoning and decision-making capabilities with the help of our memory-tool synergistic reasoning. Additionally, we introduce a new benchmark based on complex Roadwork construction scenarios to rigorously evaluate zero-shot generalization. Extensive experiments demonstrate the superior effectiveness of our approach. On the public NAVSIM benchmark, our 3B-parameter MTRDrive model achieves an exceptional PDMS of 88.3 without chain-of-thought and sets a state-of-the-art performance bar on high-level planning, with a driving metric score of 79.8% and a planning accuracy of 82.6%. Rigorous zero-shot evaluation on the new Roadwork-VLM benchmark shows a strong ability to reason robustly in unseen scenarios, achieving a driving metric score of 80.2%. These results highlight MTRDrive's potential to advance autonomous driving toward safer and more reliable systems.
Problem

Research questions and friction points this paper is trying to address.

Addresses fragility of Vision-Language Models in autonomous driving scenarios
Improves generalization and decision-making in out-of-distribution conditions
Reduces hallucinations in complex corner cases for reliable deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates procedural driving experiences with dynamic toolkits
Uses memory-based experience retrieval in a closed-loop system
Enables synergistic reasoning for robust decision-making
🔎 Similar Papers
No similar papers found.
Ziang Luo
Ziang Luo
Tsinghua University
Autonomous driving
K
Kangan Qian
School of Vehicle and Mobility, Tsinghua University, Beijing, China
J
Jiahua Wang
Automotive and Robotics, Xiaomi Corporation, Beijing, China
Y
Yuechen Luo
School of Vehicle and Mobility, Tsinghua University, Beijing, China
J
Jinyu Miao
School of Vehicle and Mobility, Tsinghua University, Beijing, China
Zheng Fu
Zheng Fu
Tsinghua university
Y
Yunlong Wang
School of Vehicle and Mobility, Tsinghua University, Beijing, China
Sicong Jiang
Sicong Jiang
McGill University, 2077AI
Large Language ModelsVision Language ModelsAutonomous DrivingTrustworthy AI
Zilin Huang
Zilin Huang
University of Wisconsin–Madison
Human-Centered AIHuman-AI CollaborationAutonomous DrivingRoboticsIntelligent Transportation
Y
Yifei Hu
Automotive and Robotics, Xiaomi Corporation, Beijing, China
Yuhao Yang
Yuhao Yang
University of Hong Kong
Large Language ModelsAgentic ModelsFoundation ModelsGraph Learning
H
Hao Ye
Automotive and Robotics, Xiaomi Corporation, Beijing, China
M
Mengmeng Yang
School of Vehicle and Mobility, Tsinghua University, Beijing, China
X
Xiaojian Dong
Automotive and Robotics, Xiaomi Corporation, Beijing, China
Kun Jiang
Kun Jiang
Tsinghua University
autonomous driving
D
Diange Yang
School of Vehicle and Mobility, Tsinghua University, Beijing, China