MTRDrive: Memory-Tool Synergistic Reasoning for Robust Autonomous Driving in Corner Cases

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Current vision-language models (VLMs) exhibit hallucination and poor generalization under out-of-distribution (OOD) conditions—particularly in complex, dynamic driving scenarios—hindering their real-world deployment in end-to-end autonomous driving. To address this, we propose a memory–tool collaborative closed-loop reasoning framework: contextual awareness is enhanced via experience-based memory retrieval, while dynamic tool invocation enables proactive reasoning and decision-making for long-tail situations (e.g., road construction). We further introduce Roadwork-VLM—the first benchmark specifically designed to evaluate VLMs on construction-related driving scenarios. Our method achieves PDMS 88.3, high-level planning accuracy of 79.8%, and overall planning accuracy of 82.6% on NAVSIM; under zero-shot evaluation on Roadwork-VLM, it attains 80.2%, substantially improving robustness and generalization of VLMs in complex, dynamic environments.

Technology Category

Application Category

📝 Abstract

Vision-Language Models(VLMs) have demonstrated significant potential for end-to-end autonomous driving, yet a substantial gap remains between their current capabilities and the reliability necessary for real-world deployment. A critical challenge is their fragility, characterized by hallucinations and poor generalization in out-of-distribution (OOD) scenarios. To bridge this gap, we introduce MTRDrive, a novel framework that integrates procedural driving experiences with a dynamic toolkit to enhance generalization and proactive decision-making. MTRDrive addresses these limitations through a closed-loop system that combines a memory-based experience retrieval mechanism with dynamic toolkits. This synergy enables the model to interact more effectively with its environment, improving both reasoning and decision-making capabilities with the help of our memory-tool synergistic reasoning. Additionally, we introduce a new benchmark based on complex Roadwork construction scenarios to rigorously evaluate zero-shot generalization. Extensive experiments demonstrate the superior effectiveness of our approach. On the public NAVSIM benchmark, our 3B-parameter MTRDrive model achieves an exceptional PDMS of 88.3 without chain-of-thought and sets a state-of-the-art performance bar on high-level planning, with a driving metric score of 79.8% and a planning accuracy of 82.6%. Rigorous zero-shot evaluation on the new Roadwork-VLM benchmark shows a strong ability to reason robustly in unseen scenarios, achieving a driving metric score of 80.2%. These results highlight MTRDrive's potential to advance autonomous driving toward safer and more reliable systems.

Problem

Research questions and friction points this paper is trying to address.

Addresses fragility of Vision-Language Models in autonomous driving scenarios

Improves generalization and decision-making in out-of-distribution conditions

Reduces hallucinations in complex corner cases for reliable deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates procedural driving experiences with dynamic toolkits

Uses memory-based experience retrieval in a closed-loop system

Enables synergistic reasoning for robust decision-making

🔎 Similar Papers

No similar papers found.