ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality

📅 2026-05-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

229K/year
🤖 AI Summary
This work proposes a novel full-reference image quality assessment (IQA) paradigm centered on machine task utility rather than human perception. To this end, the authors introduce “latent machine utility” as a new evaluation criterion and develop ML-CLIPSim, a multi-layer similarity metric built upon a frozen, pre-trained CLIP visual encoder. ML-CLIPSim jointly leverages consistency between intermediate patch tokens and global embeddings across layers and constructs the PCMP dataset via predicted consistency voting. Experiments demonstrate that the proposed approach consistently outperforms conventional fidelity- and perception-based metrics across diverse benchmarks—including machine preference evaluation, human IQA datasets, and learned image compression—significantly improving the rate–task performance trade-off.
📝 Abstract
We study full-reference image quality assessment from a machine-centric perspective, where images are evaluated by how well they preserve information for downstream models. We formulate machine-oriented quality as a latent machine utility and approximate it through pairwise predictive-consistency comparisons. To this end, we construct PCMP, a dataset of PSNR-matched distortion pairs labeled by consistency votes from multiple pretrained models. We further propose ML-CLIPSim, a differentiable quality metric built on a frozen CLIP visual encoder, which aggregates intermediate patch-token similarities and global image embeddings. Experiments on machine-preference benchmarks, human-IQA datasets, and learned image compression show that ML-CLIPSim better aligns with machine-oriented preferences than conventional fidelity and perceptual metrics, while remaining competitive for human quality prediction. Used as a compression distortion term, it improves rate--task trade-offs across multiple downstream tasks.
Problem

Research questions and friction points this paper is trying to address.

image quality assessment
machine-oriented
full-reference
downstream tasks
distortion
Innovation

Methods, ideas, or system contributions that make the work stand out.

machine-oriented IQA
predictive consistency
CLIP-based metric
learned image compression
latent machine utility
🔎 Similar Papers
2024-08-27International Conference on Pattern RecognitionCitations: 5