🤖 AI Summary
To address insufficient planning safety in low-frequency complex driving scenarios, weak generalization of end-to-end models, and high computational overhead of vision-language models (VLMs), this paper proposes a fast-slow dual-system fusion architecture. The fast system employs a lightweight end-to-end model for real-time trajectory generation, while the slow system—driven by a VLM—is activated only upon detection of dynamic uncertainty to perform semantic reasoning and decision correction. Key innovations include an uncertainty-aware on-demand switching mechanism, an information bottleneck structure augmented with high-level planning feedback, and a bidirectional knowledge exchange mechanism integrating visual prompting and decision feedback. The method jointly incorporates uncertainty estimation, question-answering–based reasoning, reward-instructed training, and information bottleneck optimization. In open-loop evaluation, the approach reduces L2 trajectory error by 6.7% and collision rate by 28.1%.
📝 Abstract
Ensuring safe, comfortable, and efficient planning is crucial for autonomous driving systems. While end-to-end models trained on large datasets perform well in standard driving scenarios, they struggle with complex low-frequency events. Recent Large Language Models (LLMs) and Vision Language Models (VLMs) advancements offer enhanced reasoning but suffer from computational inefficiency. Inspired by the dual-process cognitive model"Thinking, Fast and Slow", we propose $ extbf{FASIONAD}$ -- a novel dual-system framework that synergizes a fast end-to-end planner with a VLM-based reasoning module. The fast system leverages end-to-end learning to achieve real-time trajectory generation in common scenarios, while the slow system activates through uncertainty estimation to perform contextual analysis and complex scenario resolution. Our architecture introduces three key innovations: (1) A dynamic switching mechanism enabling slow system intervention based on real-time uncertainty assessment; (2) An information bottleneck with high-level plan feedback that optimizes the slow system's guidance capability; (3) A bidirectional knowledge exchange where visual prompts enhance the slow system's reasoning while its feedback refines the fast planner's decision-making. To strengthen VLM reasoning, we develop a question-answering mechanism coupled with reward-instruct training strategy. In open-loop experiments, FASIONAD achieves a $6.7%$ reduction in average $L2$ trajectory error and $28.1%$ lower collision rate.