Latent Guidance in Diffusion Models for Perceptual Evaluations

📅 2025-05-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the insufficient modeling of perceptual consistency in no-reference image quality assessment (NR-IQA). We propose Perceptual Manifold Guidance (PMG), the first method to embed human visual perception features—such as LPIPS and DISTS—into the latent-space sampling process of pre-trained latent diffusion models (LDMs), enabling multi-scale, multi-timestep perceptual guidance. By constraining latent sampling and distilling perceptual features, PMG explicitly aligns generated representations with subjective quality judgments. The approach is plug-and-play, requiring no fine-tuning of the backbone LDM and supporting arbitrary pre-trained models. Evaluated on mainstream NR-IQA benchmarks, PMG achieves state-of-the-art performance, significantly improving correlation between predicted scores and human subjective ratings. This demonstrates that the latent space of diffusion models inherently encodes a structured perceptual manifold amenable to explicit exploitation for quality assessment.

Technology Category

Application Category

📝 Abstract
Despite recent advancements in latent diffusion models that generate high-dimensional image data and perform various downstream tasks, there has been little exploration into perceptual consistency within these models on the task of No-Reference Image Quality Assessment (NR-IQA). In this paper, we hypothesize that latent diffusion models implicitly exhibit perceptually consistent local regions within the data manifold. We leverage this insight to guide on-manifold sampling using perceptual features and input measurements. Specifically, we propose Perceptual Manifold Guidance (PMG), an algorithm that utilizes pretrained latent diffusion models and perceptual quality features to obtain perceptually consistent multi-scale and multi-timestep feature maps from the denoising U-Net. We empirically demonstrate that these hyperfeatures exhibit high correlation with human perception in IQA tasks. Our method can be applied to any existing pretrained latent diffusion model and is straightforward to integrate. To the best of our knowledge, this paper is the first work on guiding diffusion model with perceptual features for NR-IQA. Extensive experiments on IQA datasets show that our method, LGDM, achieves state-of-the-art performance, underscoring the superior generalization capabilities of diffusion models for NR-IQA tasks.
Problem

Research questions and friction points this paper is trying to address.

Explores perceptual consistency in latent diffusion models for NR-IQA
Proposes PMG to guide sampling using perceptual quality features
Demonstrates hyperfeatures correlate with human perception in IQA
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages latent diffusion models for perceptual consistency
Introduces Perceptual Manifold Guidance (PMG) algorithm
Utilizes multi-scale perceptual quality features
🔎 Similar Papers
No similar papers found.
Shreshth Saini
Shreshth Saini
Ph.D. Student at University of Texas at Austin
Machine LearningDeep LearningComputer visionVideo EngineeringMedical Imaging Analysis
R
Ru-Ling Liao
Alibaba Group, Sunnyvale, USA
Yan Ye
Yan Ye
Alibaba Inc
video coding
A
A. Bovik
Laboratory for Image and Video Engineering (LIVE), The University of Texas at Austin, Texas, USA