Latent Feature-Guided Conditional Diffusion for High-Fidelity Generative Image Semantic Communication

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Current DeepJSCC approaches for 6G semantic communication overly rely on pixel-level metrics and neglect human perceptual requirements. To address this, we propose a high-fidelity generative image semantic communication framework. Our method maps images into a nonlinear neural latent space, employs adaptive-length joint source-channel coding (JSCC) to transmit latent features, and—novelty introduced herein—leverages a conditional diffusion model at the receiver to synthesize high-fidelity, semantically consistent images conditioned on the decoded latent features. Furthermore, we design an SNR-adaptive inference mechanism to jointly optimize perceptual quality, semantic consistency, and channel robustness. Experimental results demonstrate that our method reduces LPIPS by 43.3% compared to DeepJSCC, significantly enhancing noise resilience and semantic fidelity.

Technology Category

Application Category

📝 Abstract

Semantic communication is proposed and expected to improve the efficiency and effectiveness of massive data transmission over sixth generation (6G) networks. However, existing deep learning-based joint source and channel coding (DeepJSCC) image semantic communication scheme predominantly focuses on optimizing pixel-level metrics, and neglects human perceptual requirements, which results in degraded perceptual quality. To address this issue, we propose a latent representation-oriented image semantic communication (LRISC) system, which transmits latent semantic features for image generation with semantic consistency, thereby ensuring the perceptual quality at the receiver. In particular, we first map the source image to latent features in a high-dimensional semantic space via a neural network (NN)- based non-linear transformation. Subsequently, these features are encoded using a joint source and channel coding (JSCC) scheme with adaptive coding length for efficient transmission over a wireless channel. At the receiver, a conditional diffusion model is developed by using the received latent features as conditional guidance to steer the reverse diffusion process, progressively reconstructing high-fidelity images while preserving semantic consistency. Moreover, we introduce a channel signal-to-noise ratio (SNR) adaptation mechanism, allowing one model to work across various channel states. Experiments show that the proposed method significantly outperforms existing methods, in terms of learned perceptual image patch similarity (LPIPS) and robustness against channel noise, with an average LPIPS reduction of 43.3% compared to DeepJSCC, while guaranteeing the semantic consistency.

Problem

Research questions and friction points this paper is trying to address.

Enhancing perceptual quality in image semantic communication

Transmitting latent features for semantic-consistent image generation

Adapting to varying channel states for robust performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent feature-guided conditional diffusion model

Adaptive joint source and channel coding

Channel SNR adaptation mechanism

🔎 Similar Papers

CreativeSynth: Cross-Art-Attention for Artistic Image Synthesis with Multimodal Diffusion