Spatiotemporal Semantic V2X Framework for Cooperative Collision Prediction

📅 2026-01-23

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the stringent bandwidth and latency demands of real-time collision prediction in vehicular networks, which conventional V2X systems struggle to meet due to their reliance on transmitting raw video or high-dimensional perception data. To overcome this limitation, the authors propose a semantic V2X framework that introduces spatiotemporal semantic embeddings into V2X communication for the first time. Specifically, roadside units employ a Video Joint Embedding Predictive Architecture (V-JEPA) to generate compact semantic representations of future frames, which are then transmitted over V2X links to vehicles. Onboard, a lightweight attention-based probe and classifier decode these embeddings to predict collisions. Evaluated in a digital twin traffic environment, the approach achieves end-to-end cooperative early warning with four orders of magnitude lower communication overhead compared to raw video transmission, while simultaneously improving the F1-score by 10%, effectively balancing efficiency and accuracy.

Technology Category

Application Category

📝 Abstract

Intelligent Transportation Systems (ITS) demand real-time collision prediction to ensure road safety and reduce accident severity. Conventional approaches rely on transmitting raw video or high-dimensional sensory data from roadside units (RSUs) to vehicles, which is impractical under vehicular communication bandwidth and latency constraints. In this work, we propose a semantic V2X framework in which RSU-mounted cameras generate spatiotemporal semantic embeddings of future frames using the Video Joint Embedding Predictive Architecture (V-JEPA). To evaluate the system, we construct a digital twin of an urban traffic environment enabling the generation of d verse traffic scenarios with both safe and collision events. These embeddings of the future frame, extracted from V-JEPA, capture task-relevant traffic dynamics and are transmitted via V2X links to vehicles, where a lightweight attentive probe and classifier decode them to predict imminent collisions. By transmitting only semantic embeddings instead of raw frames, the proposed system significantly reduces communication overhead while maintaining predictive accuracy. Experimental results demonstrate that the framework with an appropriate processing method achieves a 10% F1-score improvement for collision prediction while reducing transmission requirements by four orders of magnitude compared to raw video. This validates the potential of semantic V2X communication to enable cooperative, real-time collision prediction in ITS.

Problem

Research questions and friction points this paper is trying to address.

collision prediction

V2X communication

semantic representation

intelligent transportation systems

communication overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic V2X

spatiotemporal embedding

collision prediction