🤖 AI Summary
Chain-of-thought (CoT) reasoning in autonomous driving suffers from significant latency due to sequential generation, hindering real-time performance. This work proposes FastDriveCoT, the first approach to enable parallel decoding of templated CoT by modeling subtask dependencies as a directed graph, thereby decomposing structured reasoning into independent steps that can be executed concurrently. By synchronously generating multi-step reasoning within a single forward pass, FastDriveCoT integrates dependency graph modeling, parallel decoding, and a unified vision–language–action architecture. Evaluated across multiple model architectures, the method achieves a 3–4× speedup in CoT generation, substantially reducing end-to-end latency while maintaining or even improving downstream task performance.
📝 Abstract
Chain-of-Thought (CoT) reasoning enhances the decision-making capabilities of vision-language-action models in autonomous driving, but its autoregressive nature introduces significant inference latency, making it impractical for real-time applications. To address this, we introduce FastDriveCoT, a novel parallel decoding method that accelerates template-structured CoT. Our approach decomposes the reasoning process into a dependency graph of distinct sub-tasks, such as identifying critical objects and summarizing traffic rules, some of which can be generated in parallel. By generating multiple independent reasoning steps concurrently within a single forward pass, we significantly reduce the number of sequential computations. Experiments demonstrate a 3-4$\times$ speedup in CoT generation and a substantial reduction in end-to-end latency across various model architectures, all while preserving the original downstream task improvements brought by incorporating CoT reasoning.