Semantic-Aware Visual Information Transmission With Key Information Extraction Over Wireless Networks

๐Ÿ“… 2025-06-15
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenge of jointly optimizing computational efficiency, robustness, and reconstruction quality for image transmission over dynamic 6G channels, this paper proposes a semantic-aware deep joint source-channel coding (JSCC) framework. The method introduces a differentiable end-to-end architecture that synergistically integrates foreground semantic extraction with adaptive background synthesis: MediaPipe-based pose estimation localizes semantically critical regions; Rembg and a pre-trained background library enable lightweight background reconstruction. This achieves foreground-focused encoding and background-simplified synthesis in a unified optimization. Experiments demonstrate that the proposed approach significantly outperforms conventional JSCC methods in PSNR under low SNR conditions, reduces transmission overhead by approximately 35%, and maintains high visual fidelityโ€”making it particularly suitable for resource-constrained mobile multimedia applications.

Technology Category

Application Category

๐Ÿ“ Abstract
The advent of 6G networks demands unprecedented levels of intelligence, adaptability, and efficiency to address challenges such as ultra-high-speed data transmission, ultra-low latency, and massive connectivity in dynamic environments. Traditional wireless image transmission frameworks, reliant on static configurations and isolated source-channel coding, struggle to balance computational efficiency, robustness, and quality under fluctuating channel conditions. To bridge this gap, this paper proposes an AI-native deep joint source-channel coding (JSCC) framework tailored for resource-constrained 6G networks. Our approach integrates key information extraction and adaptive background synthesis to enable intelligent, semantic-aware transmission. Leveraging AI-driven tools, Mediapipe for human pose detection and Rembg for background removal, the model dynamically isolates foreground features and matches backgrounds from a pre-trained library, reducing data payloads while preserving visual fidelity. Experimental results demonstrate significant improvements in peak signal-to-noise ratio (PSNR) compared with traditional JSCC method, especially under low-SNR conditions. This approach offers a practical solution for multimedia services in resource-constrained mobile communications.
Problem

Research questions and friction points this paper is trying to address.

Enhance visual data transmission efficiency in 6G networks
Balance computational efficiency and quality in dynamic channels
Reduce data payloads while preserving visual fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-native deep joint source-channel coding
Key information extraction and adaptive synthesis
Dynamic foreground isolation and background matching
๐Ÿ”Ž Similar Papers
No similar papers found.
C
Chen Zhu
School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China, and also with Polytechnic Institute, Zhejiang University, Hangzhou 310015, China
Kang Liang
Kang Liang
College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China
J
Jianrong Bao
School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
Zhouxiang Zhao
Zhouxiang Zhao
Zhejiang University
Semantic CommunicationsWireless CommunicationsInternet of Agents
Z
Zhaohui Yang
College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China
Z
Zhaoyang Zhang
College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China
Mohammad Shikh-Bahaei
Mohammad Shikh-Bahaei
Professor of Telecommunications, King's College London
Wireless CommunicationsSignal ProcessingMultimedia