🤖 AI Summary
To address safety risks in autonomous driving caused by sensor occlusion from large objects—leading to perceptual and decision-making failures—this paper proposes a vehicle-to-vehicle (V2V) cooperative driving framework integrating multimodal large language models (MLLMs) and Thought Graphs. The method fuses heterogeneous V2V sensory data and employs graph-based reasoning to enhance causal inference and long-horizon temporal understanding under complex occlusion scenarios. Its core contribution is the first introduction of structured Thought Graphs into cooperative autonomous driving, enabling an occlusion-aware perception enhancement mechanism and a joint perception–prediction–planning paradigm that unifies perception, motion forecasting, and trajectory planning. Experiments on our newly constructed V2V-GoT-QA benchmark demonstrate statistically significant improvements over state-of-the-art baselines across all three tasks: cooperative perception, motion prediction, and trajectory planning.
📝 Abstract
Current state-of-the-art autonomous vehicles could face safety-critical situations when their local sensors are occluded by large nearby objects on the road. Vehicle-to-vehicle (V2V) cooperative autonomous driving has been proposed as a means of addressing this problem, and one recently introduced framework for cooperative autonomous driving has further adopted an approach that incorporates a Multimodal Large Language Model (MLLM) to integrate cooperative perception and planning processes. However, despite the potential benefit of applying graph-of-thoughts reasoning to the MLLM, this idea has not been considered by previous cooperative autonomous driving research. In this paper, we propose a novel graph-of-thoughts framework specifically designed for MLLM-based cooperative autonomous driving. Our graph-of-thoughts includes our proposed novel ideas of occlusion-aware perception and planning-aware prediction. We curate the V2V-GoT-QA dataset and develop the V2V-GoT model for training and testing the cooperative driving graph-of-thoughts. Our experimental results show that our method outperforms other baselines in cooperative perception, prediction, and planning tasks.