PreResQ-R1: Towards Fine-Grained Rank-and-Score Reinforcement Learning for Visual Quality Assessment via Preference-Response Disentangled Policy Optimization

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing visual quality assessment (VQA) methods suffer from shallow reasoning, poor score calibration, and weak cross-domain generalization due to reliance on supervised fine-tuning or single-objective ranking. This paper proposes a preference–response decoupled reinforcement learning framework featuring a dual-branch reward mechanism and grouped relative policy optimization, unifying absolute scoring and relative ranking objectives for fine-grained, interpretable quality inference. Technically, it integrates reinforcement fine-tuning of multimodal large language models with spatiotemporal data stream modeling. Evaluated on 10 image quality assessment (IQA) and 5 VQA benchmarks, the method achieves state-of-the-art performance: +5.30% Spearman rank correlation coefficient (SRCC) and +2.15% Pearson linear correlation coefficient (PLCC) on IQA tasks, with significantly improved cross-domain generalization and human-aligned reasoning consistency.

Technology Category

Application Category

📝 Abstract
Visual Quality Assessment (QA) seeks to predict human perceptual judgments of visual fidelity. While recent multimodal large language models (MLLMs) show promise in reasoning about image and video quality, existing approaches mainly rely on supervised fine-tuning or rank-only objectives, resulting in shallow reasoning, poor score calibration, and limited cross-domain generalization. We propose PreResQ-R1, a Preference-Response Disentangled Reinforcement Learning framework that unifies absolute score regression and relative ranking consistency within a single reasoning-driven optimization scheme. Unlike prior QA methods, PreResQ-R1 introduces a dual-branch reward formulation that separately models intra-sample response coherence and inter-sample preference alignment, optimized via Group Relative Policy Optimization (GRPO). This design encourages fine-grained, stable, and interpretable chain-of-thought reasoning about perceptual quality. To extend beyond static imagery, we further design a global-temporal and local-spatial data flow strategy for Video Quality Assessment. Remarkably, with reinforcement fine-tuning on only 6K images and 28K videos, PreResQ-R1 achieves state-of-the-art results across 10 IQA and 5 VQA benchmarks under both SRCC and PLCC metrics, surpassing by margins of 5.30% and textbf2.15% in IQA task, respectively. Beyond quantitative gains, it produces human-aligned reasoning traces that reveal the perceptual cues underlying quality judgments. Code and model are available.
Problem

Research questions and friction points this paper is trying to address.

Improves visual quality assessment with fine-grained reasoning and score calibration
Addresses shallow reasoning and poor generalization in existing QA methods
Unifies absolute score regression and relative ranking in reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Preference-Response Disentangled Reinforcement Learning framework
Dual-branch reward formulation for intra- and inter-sample modeling
Global-temporal and local-spatial data flow for videos
🔎 Similar Papers
No similar papers found.
Z
Zehui Feng
Shanghai Jiao Tong University, Shanghai, China
T
Tian Qiu
Shanghai Jiao Tong University, Shanghai, China
T
Tong Wu
Shanghai Jiao Tong University, Shanghai, China
Junxuan Li
Junxuan Li
Research Scientist, Codec Avatars Lab, Meta
Computer Vision
H
Huayuan Xu
Shanghai Jiao Tong University, Shanghai, China
T
Ting Han
Shanghai Jiao Tong University, Shanghai, China; Zhejiang University, Hangzhou, China