Beyond Hallucinations: A Multimodal-Guided Task-Aware Generative Image Compression for Ultra-Low Bitrate

📅 2025-12-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address semantic distortion caused by hallucination in ultra-low-bitrate (<0.05 bpp) image compression for 6G semantic communications, this paper proposes a multimodal collaborative, task-aware generative compression framework. Methodologically, it introduces a task-aware semantic pseudo-word generation module and a dual-path diffusion decoder that achieves fine-grained semantic alignment via cross-attention and ControlNet-based residual injection, jointly fusing text, highly compressed images, and semantic pseudo-words. To the best of our knowledge, this is the first generative compression framework to deeply integrate multimodal guidance with task-driven semantic modeling. Experiments on DIV2K demonstrate a 10.59% reduction in DISTS score over state-of-the-art generative and conventional compression methods, achieving superior trade-offs among semantic consistency, perceptual quality, and pixel-level fidelity.

Technology Category

Application Category

📝 Abstract
Generative image compression has recently shown impressive perceptual quality, but often suffers from semantic deviations caused by generative hallucinations at ultra-low bitrate (bpp < 0.05), limiting its reliable deployment in bandwidth-constrained 6G semantic communication scenarios. In this work, we reassess the positioning and role of of multimodal guidance, and propose a Multimodal-Guided Task-Aware Generative Image Compression (MTGC) framework. Specifically, MTGC integrates three guidance modalities to enhance semantic consistency: a concise but robust text caption for global semantics, a highly compressed image (HCI) retaining low-level visual information, and Semantic Pseudo-Words (SPWs) for fine-grained task-relevant semantics. The SPWs are generated by our designed Task-Aware Semantic Compression Module (TASCM), which operates in a task-oriented manner to drive the multi-head self-attention mechanism to focus on and extract semantics relevant to the generation task while filtering out redundancy. Subsequently, to facilitate the synergistic guidance of these modalities, we design a Multimodal-Guided Diffusion Decoder (MGDD) employing a dual-path cooperative guidance mechanism that synergizes cross-attention and ControlNet additive residuals to precisely inject these three guidance into the diffusion process, and leverages the diffusion model's powerful generative priors to reconstruct the image. Extensive experiments demonstrate that MTGC consistently improves semantic consistency (e.g., DISTS drops by 10.59% on the DIV2K dataset) while also achieving remarkable gains in perceptual quality and pixel-level fidelity at ultra-low bitrate.
Problem

Research questions and friction points this paper is trying to address.

Addresses semantic deviations in ultra-low bitrate generative image compression
Enhances semantic consistency using multimodal guidance for 6G communication
Integrates text, compressed image, and task-aware semantics to reduce hallucinations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal guidance enhances semantic consistency
Task-aware compression extracts fine-grained semantics
Dual-path diffusion decoder injects multimodal guidance
🔎 Similar Papers
No similar papers found.
Kaile Wang
Kaile Wang
Peking University
Lijun He
Lijun He
General Electric Global Research Center
Haisheng Fu
Haisheng Fu
The University of British Columbia, Postdoctoral Fellow
Deep LearingImage CompressionVideo CompressionIC Design,Hardware Implementation,Cryptography
H
Haixia Bi
Shaanxi Key Laboratory of Deep Space Exploration Intelligent Information Technology, School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an, 710049, China
F
Fan Li
Shaanxi Key Laboratory of Deep Space Exploration Intelligent Information Technology, School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an, 710049, China