Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks

📅 2025-05-05
🏛️ IEEE Transactions on Mobile Computing
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low semantic communication efficiency and poor visual question answering (VQA) accuracy under low signal-to-noise ratio (SNR) conditions in vehicular networks, this paper proposes the first large language model (LLM)-driven semantic communication framework tailored for VQA tasks. Built upon LLaVA, it introduces a task-oriented semantic encoder that innovatively fuses user-attention-guided image patching with objective visual features, and designs a semantic-importance-weighted transmission mechanism for adaptive resource allocation at the semantic level. Compared to conventional communication methods, the framework improves VQA accuracy by 33.1% at 10 dB SNR and 13.4% at 12 dB SNR, significantly enhancing task robustness and spectral efficiency in noisy environments. Key contributions include: (i) the first integration of large multimodal models (LMMs) into vehicular semantic communication; (ii) a novel subjective–objective collaborative paradigm for image patch importance assessment; and (iii) end-to-end semantic-aware transmission optimization.

Technology Category

Application Category

📝 Abstract
Task-oriented semantic communication has emerged as a fundamental approach for enhancing performance in various communication scenarios. While recent advances in Generative Artificial Intelligence (GenAI), such as Large Language Models (LLMs), have been applied to semantic communication designs, the potential of Large Multimodal Models (LMMs) remains largely unexplored. In this paper, we investigate an LMM-based vehicle AI assistant using a Large Language and Vision Assistant (LLaVA) and propose a task-oriented semantic communication framework to facilitate efficient interaction between users and cloud servers. To reduce computational demands and shorten response time, we optimize LLaVA's image slicing to selectively focus on areas of utmost interest to users. Additionally, we assess the importance of image patches by combining objective and subjective user attention, adjusting energy usage for transmitting semantic information. This strategy optimizes resource utilization, ensuring precise transmission of critical information. We construct a Visual Question Answering (VQA) dataset for traffic scenarios to evaluate effectiveness. Experimental results show that our semantic communication framework significantly increases accuracy in answering questions under the same channel conditions, performing particularly well in environments with poor Signal-to-Noise Ratios (SNR). Accuracy can be improved by 13.4% at an SNR of 12dB and 33.1% at 10dB, respectively.
Problem

Research questions and friction points this paper is trying to address.

Exploring LMMs for task-oriented semantic communication in vehicle networks
Optimizing image processing to reduce computational load and response time
Enhancing transmission efficiency by prioritizing critical visual information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimized LLaVA image slicing for user focus
Combined objective and subjective user attention
Energy-adjusted semantic information transmission
B
Baoxia Du
Institute of Science and Engineering, Kanazawa University, Kanazawa 920-1192, Japan
H
Hongyang Du
Department of Electrical and Electronic Engineering, University of Hong Kong, Pok Fu Lam, Hong Kong
D
Dusit Niyato
College of Computing and Data Science, Nanyang Technological University, Singapore
Ruidong Li
Ruidong Li
Associate Professor, Kanazawa University; Internet TC Chair; IEEE DL
Network ArchitectureMetaverseQuantum NetworkBig DataNetwork Security