Text-Guided Diffusion Model-based Generative Communication for Wireless Image Transmission

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address severe visual quality degradation in image transmission over ultra-low-bitrate wireless channels, this paper proposes a diffusion-based generative semantic communication framework. Departing from conventional pixel-level reconstruction, the method jointly optimizes source-channel coding with text-guided semantic priors to achieve semantically consistent and visually natural image recovery at extremely low bitrates. Innovatively, it integrates Stable Diffusion’s generative prior with ControlNet’s structural constraints and employs text prompts to drive semantic-level reconstruction, substantially reducing bandwidth dependency. Experiments demonstrate that the proposed framework significantly outperforms traditional codecs and state-of-the-art deep learning methods across multiple challenging channel conditions, achieving superior perceptual quality (lower LPIPS and FID scores) and enhanced error resilience. This work establishes a novel paradigm for high-fidelity visual communication under extreme bandwidth constraints.

Technology Category

Application Category

📝 Abstract
Reliable image transmission over wireless channels is particularly challenging at extremely low transmission rates, where conventional compression and channel coding schemes fail to preserve adequate visual quality. To address this issue, we propose a generative communication framework based on diffusion models, which integrates joint source channel coding (JSCC) with semantic-guided reconstruction leveraging a pre-trained generative model. Unlike conventional architectures that aim to recover exact pixel values of the original image, the proposed method focuses on preserving and reconstructing semantically meaningful visual content under severely constrained rates, ensuring perceptual plausibility and faithfulness to the scene intent. Specifically, the transmitter encodes the source image via JSCC and jointly transmits it with a textual prompt over the wireless channel. At the receiver, the corrupted low-rate representation is fused with the prompt and reconstructed through a Stable Diffusion model with ControlNet, enabling high-quality visual recovery. Leveraging both generative priors and semantic guidance, the proposed framework produces perceptually convincing images even under extreme bandwidth limitations. Experimental results demonstrate that the proposed method consistently outperforms conventional coding-based schemes and deep learning baselines, achieving superior perceptual quality and robustness across various channel conditions.
Problem

Research questions and friction points this paper is trying to address.

Addresses image transmission challenges at low wireless rates
Preserves semantic content over pixel accuracy under bandwidth constraints
Enhances visual quality using generative models and text guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion models for generative communication
Integrates JSCC with semantic text guidance
Employs Stable Diffusion and ControlNet for reconstruction
🔎 Similar Papers
No similar papers found.
S
Shengkang Chen
Cooperative Medianet Innovation Center and the Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
T
Tong Wu
Cooperative Medianet Innovation Center and the Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Zhiyong Chen
Zhiyong Chen
Shanghai Jiao Tong University
6G networksWireless CommunicationsComputing and Caching Networks
F
Feng Yang
Cooperative Medianet Innovation Center and the Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Meixia Tao
Meixia Tao
Professor at Shanghai Jiao Tong University; Fellow of IEEE
wireless communicationscachingedge computing5G+
Wenjun Zhang
Wenjun Zhang
City University of Hong Kong
Thin film technologynanomaterials and nanodevices