Perception-Oriented Latent Coding for High-Performance Compressed Domain Semantic Inference

📅 2025-07-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Mean squared error (MSE)-optimized image coding yields semantically impoverished latent representations, limiting downstream task performance. Method: We propose a perception-oriented latent coding framework that jointly optimizes perceptual fidelity and semantic richness. Specifically, we design a perception-weighted loss function and a latent feature enhancement module to explicitly enrich semantic content in the learned latent space; additionally, we introduce a plug-and-play lightweight adapter, enabling task-specific adaptation with minimal parameter updates—avoiding full-model fine-tuning. Contribution/Results: Experiments demonstrate that our method maintains state-of-the-art rate-perception trade-offs while significantly improving inference accuracy across multiple vision tasks (e.g., classification and detection). It reduces fine-tuning parameters by over 90% and substantially lowers computational overhead, effectively overcoming the semantic performance bottleneck inherent in MSE-driven image coding.

Technology Category

Application Category

📝 Abstract
In recent years, compressed domain semantic inference has primarily relied on learned image coding models optimized for mean squared error (MSE). However, MSE-oriented optimization tends to yield latent spaces with limited semantic richness, which hinders effective semantic inference in downstream tasks. Moreover, achieving high performance with these models often requires fine-tuning the entire vision model, which is computationally intensive, especially for large models. To address these problems, we introduce Perception-Oriented Latent Coding (POLC), an approach that enriches the semantic content of latent features for high-performance compressed domain semantic inference. With the semantically rich latent space, POLC requires only a plug-and-play adapter for fine-tuning, significantly reducing the parameter count compared to previous MSE-oriented methods. Experimental results demonstrate that POLC achieves rate-perception performance comparable to state-of-the-art generative image coding methods while markedly enhancing performance in vision tasks, with minimal fine-tuning overhead. Code is available at https://github.com/NJUVISION/POLC.
Problem

Research questions and friction points this paper is trying to address.

Limited semantic richness in MSE-optimized latent spaces
High computational cost of full vision model fine-tuning
Need for efficient compressed domain semantic inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Perception-Oriented Latent Coding (POLC) enriches semantic content
POLC uses plug-and-play adapter for fine-tuning
POLC achieves high performance with minimal overhead
🔎 Similar Papers
No similar papers found.
X
Xu Zhang
Nanjing University
M
Ming Lu
Nanjing University
Y
Yan Chen
Jiangsu Academy of Safety Science and Technology
Zhan Ma
Zhan Ma
Vision Lab, Nanjing University
Learning for Video Coding & CommunicationComputational Imaging