ViDA-UGC: Detailed Image Quality Analysis via Visual Distortion Assessment for UGC Images

📅 2025-08-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing interpretable image quality assessment (IQA) methods suffer from two key limitations: (1) applying uniform distortion criteria to both user-generated content (UGC) and AI-generated content (AIGC), thereby ignoring their fundamental differences; and (2) lacking fine-grained spatial localization and explainable reasoning capabilities. To address these, we propose the first UGC-specific visual distortion assessment framework. We introduce ViDA-UGC, a large-scale instruction-tuning dataset, and ViDA-UGC-Bench, a dedicated benchmark enabling end-to-end modeling of distortion type classification, spatial localization, perceptual description, and causal reasoning. Our approach innovatively integrates human subjective annotations with GPT-4o–driven chain-of-thought (CoT) annotation, synergizing low-level visual feature analysis and multimodal large language model instruction tuning. Extensive evaluation demonstrates significant performance gains over state-of-the-art models—including GPT-4o—on both ViDA-UGC-Bench and Q-Bench, validating the efficacy of our differentiated assessment paradigm and interpretable quality analysis framework.

Technology Category

Application Category

📝 Abstract
Recent advances in Multimodal Large Language Models (MLLMs) have introduced a paradigm shift for Image Quality Assessment (IQA) from unexplainable image quality scoring to explainable IQA, demonstrating practical applications like quality control and optimization guidance. However, current explainable IQA methods not only inadequately use the same distortion criteria to evaluate both User-Generated Content (UGC) and AI-Generated Content (AIGC) images, but also lack detailed quality analysis for monitoring image quality and guiding image restoration. In this study, we establish the first large-scale Visual Distortion Assessment Instruction Tuning Dataset for UGC images, termed ViDA-UGC, which comprises 11K images with fine-grained quality grounding, detailed quality perception, and reasoning quality description data. This dataset is constructed through a distortion-oriented pipeline, which involves human subject annotation and a Chain-of-Thought (CoT) assessment framework. This framework guides GPT-4o to generate quality descriptions by identifying and analyzing UGC distortions, which helps capturing rich low-level visual features that inherently correlate with distortion patterns. Moreover, we carefully select 476 images with corresponding 6,149 question answer pairs from ViDA-UGC and invite a professional team to ensure the accuracy and quality of GPT-generated information. The selected and revised data further contribute to the first UGC distortion assessment benchmark, termed ViDA-UGC-Bench. Experimental results demonstrate the effectiveness of the ViDA-UGC and CoT framework for consistently enhancing various image quality analysis abilities across multiple base MLLMs on ViDA-UGC-Bench and Q-Bench, even surpassing GPT-4o.
Problem

Research questions and friction points this paper is trying to address.

Lack of detailed quality analysis for UGC images
Inadequate distortion criteria for UGC and AIGC evaluation
Need for explainable IQA methods with rich visual features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses ViDA-UGC dataset for detailed quality analysis
Employs Chain-of-Thought framework for distortion assessment
Enhances MLLMs for superior image quality evaluation
🔎 Similar Papers
No similar papers found.
W
Wenjie Liao
VCIP, CS, Nankai University
J
Jieyu Yuan
VCIP, CS, Nankai University
Y
Yifang Xu
Bytedance Inc.
Chunle Guo
Chunle Guo
Nankai University
Deep LearningImage Enhancement
Z
Zilong Zhang
VCIP, CS, Nankai University
Jihong Li
Jihong Li
Shanghai university
wireless communications
J
Jiachen Fu
VCIP, CS, Nankai University
H
Haotian Fan
Bytedance Inc.
T
Tao Li
Bytedance Inc.
J
Junhui Cui
Bytedance Inc.
Chongyi Li
Chongyi Li
Professor, Nankai University
Computer VisionComputational ImagingComputational PhotographyUnderwater Imaging