Text-Guided Diffusion Model-based Generative Communication for Wireless Image Transmission

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address severe visual quality degradation in image transmission over ultra-low-bitrate wireless channels, this paper proposes a diffusion-based generative semantic communication framework. Departing from conventional pixel-level reconstruction, the method jointly optimizes source-channel coding with text-guided semantic priors to achieve semantically consistent and visually natural image recovery at extremely low bitrates. Innovatively, it integrates Stable Diffusion’s generative prior with ControlNet’s structural constraints and employs text prompts to drive semantic-level reconstruction, substantially reducing bandwidth dependency. Experiments demonstrate that the proposed framework significantly outperforms traditional codecs and state-of-the-art deep learning methods across multiple challenging channel conditions, achieving superior perceptual quality (lower LPIPS and FID scores) and enhanced error resilience. This work establishes a novel paradigm for high-fidelity visual communication under extreme bandwidth constraints.

Technology Category

Application Category

📝 Abstract

Reliable image transmission over wireless channels is particularly challenging at extremely low transmission rates, where conventional compression and channel coding schemes fail to preserve adequate visual quality. To address this issue, we propose a generative communication framework based on diffusion models, which integrates joint source channel coding (JSCC) with semantic-guided reconstruction leveraging a pre-trained generative model. Unlike conventional architectures that aim to recover exact pixel values of the original image, the proposed method focuses on preserving and reconstructing semantically meaningful visual content under severely constrained rates, ensuring perceptual plausibility and faithfulness to the scene intent. Specifically, the transmitter encodes the source image via JSCC and jointly transmits it with a textual prompt over the wireless channel. At the receiver, the corrupted low-rate representation is fused with the prompt and reconstructed through a Stable Diffusion model with ControlNet, enabling high-quality visual recovery. Leveraging both generative priors and semantic guidance, the proposed framework produces perceptually convincing images even under extreme bandwidth limitations. Experimental results demonstrate that the proposed method consistently outperforms conventional coding-based schemes and deep learning baselines, achieving superior perceptual quality and robustness across various channel conditions.

Problem

Research questions and friction points this paper is trying to address.

Addresses image transmission challenges at low wireless rates

Preserves semantic content over pixel accuracy under bandwidth constraints

Enhances visual quality using generative models and text guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion models for generative communication

Integrates JSCC with semantic text guidance

Employs Stable Diffusion and ControlNet for reconstruction

🔎 Similar Papers

No similar papers found.

Authors to Follow