Comp-X: On Defining an Interactive Learned Image Compression Paradigm With Expert-driven LLM Agent

📅 2025-08-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image codecs employ fixed encoding modes and rely on manual configuration, limiting adaptability to diverse, non-expert user requirements. To address this, we propose the first large language model (LLM)-driven interactive image compression paradigm. Our method introduces a multifunctional, configurable encoding framework supporting spatial bit allocation and human-perception–aware optimization; designs an interactive encoding agent integrating expert-feedback-enhanced in-context learning to interpret and execute natural-language compression instructions; and releases IIC-bench—the first dedicated benchmark for unified multi-objective compression evaluation. Experiments demonstrate that our system achieves rate-distortion performance comparable to conventional codecs while significantly improving text-based interaction capability and user-intent understanding accuracy. This work establishes a foundational framework for intelligent, user-centric image compression.

Technology Category

Application Category

📝 Abstract
We present Comp-X, the first intelligently interactive image compression paradigm empowered by the impressive reasoning capability of large language model (LLM) agent. Notably, commonly used image codecs usually suffer from limited coding modes and rely on manual mode selection by engineers, making them unfriendly for unprofessional users. To overcome this, we advance the evolution of image coding paradigm by introducing three key innovations: (i) multi-functional coding framework, which unifies different coding modes of various objective/requirements, including human-machine perception, variable coding, and spatial bit allocation, into one framework. (ii) interactive coding agent, where we propose an augmented in-context learning method with coding expert feedback to teach the LLM agent how to understand the coding request, mode selection, and the use of the coding tools. (iii) IIC-bench, the first dedicated benchmark comprising diverse user requests and the corresponding annotations from coding experts, which is systematically designed for intelligently interactive image compression evaluation. Extensive experimental results demonstrate that our proposed Comp-X can understand the coding requests efficiently and achieve impressive textual interaction capability. Meanwhile, it can maintain comparable compression performance even with a single coding framework, providing a promising avenue for artificial general intelligence (AGI) in image compression.
Problem

Research questions and friction points this paper is trying to address.

Overcoming limited coding modes in traditional image compression
Enabling interactive image compression using LLM agents
Unifying diverse coding requirements into a single framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified multi-functional coding framework for diverse objectives
Interactive LLM agent with expert feedback learning
First benchmark for intelligent interactive compression evaluation
🔎 Similar Papers
No similar papers found.