V2X-UniPool: Unifying Multimodal Perception and Knowledge Reasoning for Autonomous Driving

๐Ÿ“… 2025-06-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Knowledge-driven autonomous driving faces two critical challenges: limited onboard perception range and environmental hallucinations in large language models. To address these, this paper proposes a vehicle-infrastructure cooperative knowledge pool framework featuring a novel โ€œtemporal-indexed + language-drivenโ€ dual-modality knowledge architecture that jointly encodes multimodal V2X time-series data and linguistically grounded environmental representations. We design a static-dynamic dual-path retrieval-augmented generation (RAG) mechanism to enable joint reasoning over static infrastructure and dynamic traffic states. Additionally, we introduce lightweight knowledge distillation to support zero-shot on-vehicle deployment. Evaluated on a real-world cooperative driving dataset, our approach achieves significant improvements in motion planning accuracy and reasoning consistency; the onboard model attains state-of-the-art performance, while V2X communication bandwidth is reduced by over 99.9%.

Technology Category

Application Category

๐Ÿ“ Abstract
Knowledge-driven autonomous driving systems(ADs) offer powerful reasoning capabilities, but face two critical challenges: limited perception due to the short-sightedness of single-vehicle sensors, and hallucination arising from the lack of real-time environmental grounding. To address these issues, this paper introduces V2X-UniPool, a unified framework that integrates multimodal Vehicle-to-Everything (V2X) data into a time-indexed and language-based knowledge pool. By leveraging a dual-query Retrieval-Augmented Generation (RAG) mechanism, which enables retrieval of both static and dynamic knowledge, our system enables ADs to perform accurate, temporally consistent reasoning over both static environment and dynamic traffic context. Experiments on a real-world cooperative driving dataset demonstrate that V2X-UniPool significantly enhances motion planning accuracy and reasoning capability. Remarkably, it enables even zero-shot vehicle-side models to achieve state-of-the-art performance by leveraging V2X-UniPool, while simultaneously reducing transmission cost by over 99.9% compared to prior V2X methods.
Problem

Research questions and friction points this paper is trying to address.

Overcoming limited perception from single-vehicle sensors
Reducing hallucination by grounding real-time environmental data
Integrating multimodal V2X data for accurate autonomous driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies multimodal V2X data via knowledge pool
Uses dual-query RAG for static and dynamic knowledge
Enables zero-shot models with 99.9% lower transmission cost
๐Ÿ”Ž Similar Papers
No similar papers found.