Generative Semantic Coding for Ultra-Low Bitrate Visual Communication and Analysis

📅 2025-10-31

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

In ultra-low-bandwidth scenarios—such as deep-space exploration and battlefield reconnaissance—existing text-to-image generation methods reconstruct only at the semantic level, failing to meet requirements for visual fidelity and analytical accuracy. Method: We propose a text-guided joint latent correction flow model that jointly conditions a flow-based generative network on both textual semantic descriptions and deeply compressed latent variables, enabling semantically precise, structurally faithful, and visually usable image reconstruction at ultra-low bitrates. Our approach end-to-end integrates deep image compression with generative modeling, jointly optimizing semantic consistency and geometric structure recovery in the latent space. Results: At 0.05–0.1 bpp, our method outperforms state-of-the-art compression-plus-generation baselines by +2.1 dB PSNR and +3.7% mAP for downstream object detection, demonstrating superior bandwidth efficiency and compatibility with vision analytics tasks.

Technology Category

Application Category

📝 Abstract

We consider the problem of ultra-low bit rate visual communication for remote vision analysis, human interactions and control in challenging scenarios with very low communication bandwidth, such as deep space exploration, battlefield intelligence, and robot navigation in complex environments. In this paper, we ask the following important question: can we accurately reconstruct the visual scene using only a very small portion of the bit rate in existing coding methods while not sacrificing the accuracy of vision analysis and performance of human interactions? Existing text-to-image generation models offer a new approach for ultra-low bitrate image description. However, they can only achieve a semantic-level approximation of the visual scene, which is far insufficient for the purpose of visual communication and remote vision analysis and human interactions. To address this important issue, we propose to seamlessly integrate image generation with deep image compression, using joint text and coding latent to guide the rectified flow models for precise generation of the visual scene. The semantic text description and coding latent are both encoded and transmitted to the decoder at a very small bit rate. Experimental results demonstrate that our method can achieve the same image reconstruction quality and vision analysis accuracy as existing methods while using much less bandwidth. The code will be released upon paper acceptance.

Problem

Research questions and friction points this paper is trying to address.

Achieving ultra-low bitrate visual communication for remote analysis

Accurately reconstructing scenes without sacrificing analysis accuracy

Integrating image generation with compression for precise visual generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates image generation with deep compression

Uses joint text and coding latent guidance

Enables precise scene reconstruction at low bitrate

🔎 Similar Papers

Object-Attribute-Relation Representation based Video Semantic Communication

2024-06-15arXiv.orgCitations: 1

TikTok

San Jose, California

Research Scientist Intern, Photorealistic Telepresence (PhD)