Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work explores a generative image compression paradigm leveraging large AIGC models (specifically GPT-4o) to replace conventional pixel-level transform coding, aiming to preserve semantic and structural fidelity at ultra-low bitrates. Methodologically, it introduces structured raster-scan prompting to efficiently encode image semantics and spatial layout into compact textual prompts; integrates multimodal conditioning from ultra-low-resolution images and text; and enforces semantic consistency constraints to enhance reconstruction fidelity. It presents the first systematic validation of generative compression feasibility below 0.1 bits per pixel (bpp). Experiments demonstrate substantial improvements over state-of-the-art generative and multimodal compression methods in structural reconstruction and fine-detail recovery. The results empirically validate the “generation-as-compression” principle and establish a novel compression direction that bypasses explicit pixel modeling—paving the way for semantic-driven, model-based image compression.

Technology Category

Application Category

📝 Abstract

The rapid development of AIGC foundation models has revolutionized the paradigm of image compression, which paves the way for the abandonment of most pixel-level transform and coding, compelling us to ask: why compress what you can generate if the AIGC foundation model is powerful enough to faithfully generate intricate structure and fine-grained details from nothing more than some compact descriptors, i.e., texts, or cues. Fortunately, recent GPT-4o image generation of OpenAI has achieved impressive cross-modality generation, editing, and design capabilities, which motivates us to answer the above question by exploring its potential in image compression fields. In this work, we investigate two typical compression paradigms: textual coding and multimodal coding (i.e., text + extremely low-resolution image), where all/most pixel-level information is generated instead of compressing via the advanced GPT-4o image generation function. The essential challenge lies in how to maintain semantic and structure consistency during the decoding process. To overcome this, we propose a structure raster-scan prompt engineering mechanism to transform the image into textual space, which is compressed as the condition of GPT-4o image generation. Extensive experiments have shown that the combination of our designed structural raster-scan prompts and GPT-4o's image generation function achieved the impressive performance compared with recent multimodal/generative image compression at ultra-low bitrate, further indicating the potential of AIGC generation in image compression fields.

Problem

Research questions and friction points this paper is trying to address.

Exploring GPT-4o's potential in image compression via generation

Maintaining semantic consistency in decoding generated images

Achieving ultra-low bitrate compression with AIGC generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses GPT-4o for image generation from compact descriptors

Employs structure raster-scan prompt engineering mechanism

Combines textual and multimodal coding for compression

🔎 Similar Papers

Position IDs Matter: An Enhanced Position Layout for Efficient Context Compression in Large Language Models