SIMAC: A Semantic-Driven Integrated Multimodal Sensing And Communication Framework

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address key bottlenecks in bandwidth-constrained scenarios—including low perception accuracy, disjointed communication and sensing functionalities, and limited task versatility—this paper proposes a semantics-driven radar-vision multimodal sensing-communication integrated framework. Our method introduces three core innovations: (1) a novel multimodal semantic fusion network leveraging cross-modal attention for deep alignment of radar and visual features; (2) a large language model–guided, channel-adaptive semantic codec that jointly optimizes perception decoding and semantic transmission; and (3) a unified semantic space mapping with multi-task collaborative decoding architecture. Simulation results demonstrate substantial improvements over unimodal and conventional decoupled approaches across multiple tasks—including object detection, pose estimation, and action recognition—achieving a 37% reduction in end-to-end latency and a 2.1× increase in semantic transmission efficiency.

Technology Category

Application Category

📝 Abstract
Traditional single-modality sensing faces limitations in accuracy and capability, and its decoupled implementation with communication systems increases latency in bandwidth-constrained environments. Additionally, single-task-oriented sensing systems fail to address users' diverse demands. To overcome these challenges, we propose a semantic-driven integrated multimodal sensing and communication (SIMAC) framework. This framework leverages a joint source-channel coding architecture to achieve simultaneous sensing decoding and transmission of sensing results. Specifically, SIMAC first introduces a multimodal semantic fusion (MSF) network, which employs two extractors to extract semantic information from radar signals and images, respectively. MSF then applies cross-attention mechanisms to fuse these unimodal features and generate multimodal semantic representations. Secondly, we present a large language model (LLM)-based semantic encoder (LSE), where relevant communication parameters and multimodal semantics are mapped into a unified latent space and input to the LLM, enabling channel-adaptive semantic encoding. Thirdly, a task-oriented sensing semantic decoder (SSD) is proposed, in which different decoded heads are designed according to the specific needs of tasks. Simultaneously, a multi-task learning strategy is introduced to train the SIMAC framework, achieving diverse sensing services. Finally, experimental simulations demonstrate that the proposed framework achieves diverse sensing services and higher accuracy.
Problem

Research questions and friction points this paper is trying to address.

Overcomes limitations of single-modality sensing in accuracy and capability.
Reduces latency by integrating sensing and communication systems.
Addresses diverse user demands with a task-oriented multimodal framework.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal semantic fusion network for feature extraction
LLM-based semantic encoder for adaptive encoding
Task-oriented sensing semantic decoder for diverse services
Yubo Peng
Yubo Peng
南京大学
semantic communicationsgenerative artificial intelligencedeep learning
Luping Xiang
Luping Xiang
Research professor @ Nanjing University
wireless communication
K
Kun Yang
State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China, and the School of Intelligent Software and Engineering, Nanjing University (Suzhou Campus), Suzhou, China
F
Feibo Jiang
School of Information Science and Engineering, Hunan Normal University, Changsha, China
Kezhi Wang
Kezhi Wang
Professor, Royal Society Industry Fellow, Brunel University London
Wireless CommunicationEdge ComputingMachine Learning
Dapeng Oliver Wu
Dapeng Oliver Wu
City University of Hong Kong
machine learningcommunicationsvideo codingsignal processingcomputer vision