🤖 AI Summary
To address the unreliability of cooperative planning under occlusion or sensor failure in single-vehicle perception, this paper pioneers the integration of multimodal large language models (MLLMs) into the vehicle-to-vehicle (V2V) autonomous driving closed loop, proposing the V2V-QA problem paradigm and a corresponding benchmark dataset. Methodologically, we design an MLLM architecture supporting cross-vehicle perception fusion, incorporating cross-vehicle feature alignment, dynamic semantic fusion, and instruction-tuned driving question-answering—unifying scene grounding, salient object identification, and cooperative trajectory planning. Our core contribution is the first application of LLMs to cooperative decision modeling, establishing the inaugural end-to-end multimodal understanding-and-planning framework for V2V cooperative planning. Experiments on the V2V-QA benchmark demonstrate that our method improves planning accuracy by 23.6% over conventional feature-level and decision-level fusion baselines, significantly enhancing robustness in complex scenarios.
📝 Abstract
Current autonomous driving vehicles rely mainly on their individual sensors to understand surrounding scenes and plan for future trajectories, which can be unreliable when the sensors are malfunctioning or occluded. To address this problem, cooperative perception methods via vehicle-to-vehicle (V2V) communication have been proposed, but they have tended to focus on detection and tracking. How those approaches contribute to overall cooperative planning performance is still under-explored. Inspired by recent progress using Large Language Models (LLMs) to build autonomous driving systems, we propose a novel problem setting that integrates an LLM into cooperative autonomous driving, with the proposed Vehicle-to-Vehicle Question-Answering (V2V-QA) dataset and benchmark. We also propose our baseline method Vehicle-to-Vehicle Large Language Model (V2V-LLM), which uses an LLM to fuse perception information from multiple connected autonomous vehicles (CAVs) and answer driving-related questions: grounding, notable object identification, and planning. Experimental results show that our proposed V2V-LLM can be a promising unified model architecture for performing various tasks in cooperative autonomous driving, and outperforms other baseline methods that use different fusion approaches. Our work also creates a new research direction that can improve the safety of future autonomous driving systems. Our project website: https://eddyhkchiu.github.io/v2vllm.github.io/ .