Do image and video quality metrics model low-level human vision?

📅 2025-03-20

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This study investigates whether mainstream image/video quality metrics—such as SSIM, LPIPS, and VMAF—faithfully model human low-level visual mechanisms, including contrast sensitivity, contrast masking, and contrast matching. To this end, we introduce the first interpretability benchmark framework specifically designed for low-level vision properties, grounded in psychophysical principles: contrast sensitivity function testing, masking threshold estimation, and contrast-matching discrimination tasks. We systematically evaluate 33 full-reference metrics under this framework. Results show that LPIPS and MS-SSIM effectively capture contrast masking, whereas VMAF exhibits significant deficiencies; SSIM over-responds to high-frequency distortions, while MS-SSIM substantially mitigates this bias. Our analysis uncovers structural shortcomings in current metrics’ perceptual modeling, revealing misalignments with neurobiologically grounded vision principles. This work establishes a new paradigm for interpretable, neurophysiologically plausible quality assessment and provides empirical foundations for next-generation perceptual metrics.

Technology Category

Application Category

📝 Abstract

Image and video quality metrics, such as SSIM, LPIPS, and VMAF, are aimed to predict the perceived quality of the evaluated content and are often claimed to be"perceptual". Yet, few metrics directly model human visual perception, and most rely on hand-crafted formulas or training datasets to achieve alignment with perceptual data. In this paper, we propose a set of tests for full-reference quality metrics that examine their ability to model several aspects of low-level human vision: contrast sensitivity, contrast masking, and contrast matching. The tests are meant to provide additional scrutiny for newly proposed metrics. We use our tests to analyze 33 existing image and video quality metrics and find their strengths and weaknesses, such as the ability of LPIPS and MS-SSIM to predict contrast masking and poor performance of VMAF in this task. We further find that the popular SSIM metric overemphasizes differences in high spatial frequencies, but its multi-scale counterpart, MS-SSIM, addresses this shortcoming. Such findings cannot be easily made using existing evaluation protocols.

Problem

Research questions and friction points this paper is trying to address.

Evaluate image and video quality metrics' alignment with human vision.

Test metrics' ability to model low-level human vision aspects.

Analyze strengths and weaknesses of 33 existing quality metrics.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tests for full-reference quality metrics

Analyze 33 existing image and video metrics

Examine metrics' ability to model human vision

🔎 Similar Papers

Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model

2024-07-31Citations: 3

Apple

Cupertino, United States of America

Research Scientist Intern, Applied Vision and Image Quality (PhD)