🤖 AI Summary
To address the inefficiency in information transmission caused by high multi-modal perception data loads and dynamic, unstable channels in vehicle-infrastructure cooperative (V2X) systems, this paper proposes Generative AI-Enhanced Multi-modal Semantic Communication (G-MSC). G-MSC is the first framework to deeply integrate diffusion models and multi-modal large language models into semantic communication, establishing a hybrid analog-digital transmission mechanism and a task-adaptive semantic encoding-decoding architecture. It jointly optimizes cross-modal semantic alignment, noise-robust decoding, and channel-state-driven transmission mode switching. Experimental results on predictive V2X tasks demonstrate that G-MSC reduces communication overhead by 62%, improves semantic accuracy by 31%, and achieves a packet-loss resilience rate of 98.7%, significantly overcoming the generalization bottleneck of conventional semantic communication in dynamic V2X environments.
📝 Abstract
Vehicle-to-everything (V2X) communication supports numerous tasks, from driving safety to entertainment services. To achieve a holistic view, vehicles are typically equipped with multiple sensors to compensate for undetectable blind spots. However, processing large volumes of multi-modal data increases transmission load, while the dynamic nature of vehicular networks adds to transmission instability. To address these challenges, we propose a novel framework, Generative Artificial intelligence (GAI)-enhanced multi-modal semantic communication (SemCom), referred to as G-MSC, designed to handle various vehicular network tasks by employing suitable analog or digital transmission. GAI presents a promising opportunity to transform the SemCom framework by significantly enhancing semantic encoding to facilitate the optimized integration of multi-modal information, enhancing channel robustness, and fortifying semantic decoding against noise interference. To validate the effectiveness of the G-MSC framework, we conduct a case study showcasing its performance in vehicular communication networks for predictive tasks. The experimental results show that the design achieves reliable and efficient communication in V2X networks. In the end, we present future research directions on G-MSC.