Q-Hawkeye: Reliable Visual Policy Optimization for Image Quality Assessment

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in existing reinforcement learning–based image quality assessment methods, which uniformly update sample weights and thereby amplify noise while neglecting the model’s genuine perceptual sensitivity to image content. To overcome this, we propose Q-Hawkeye, a novel framework that dynamically adjusts sample update weights based on predictive uncertainty and introduces an implicit perceptual loss to strengthen the model’s reliance on authentic visual evidence through original–distorted image pairs. By integrating multi-rollout variance estimation, uncertainty-aware advantage weighting, and policy optimization, Q-Hawkeye achieves significant performance gains over state-of-the-art methods across multiple datasets, demonstrating improved assessment accuracy and enhanced cross-dataset generalization.

Technology Category

Application Category

📝 Abstract
Image Quality Assessment (IQA) predicts perceptual quality scores consistent with human judgments. Recent RL-based IQA methods built on MLLMs focus on generating visual quality descriptions and scores, ignoring two key reliability limitations: (i) although the model's prediction stability varies significantly across training samples, existing GRPO-based methods apply uniform advantage weighting, thereby amplifying noisy signals from unstable samples in gradient updates; (ii) most works emphasize text-grounded reasoning over images while overlooking the model's visual perception ability of image content. In this paper, we propose Q-Hawkeye, an RL-based reliable visual policy optimization framework that redesigns the learning signal through unified Uncertainty-Aware Dynamic Optimization and Perception-Aware Optimization. Q-Hawkeye estimates predictive uncertainty using the variance of predicted scores across multiple rollouts and leverages this uncertainty to reweight each sample's update strength, stabilizing policy optimization. To strengthen perceptual reliability, we construct paired inputs of degraded images and their original images and introduce an Implicit Perception Loss that constrains the model to ground its quality judgments in genuine visual evidence. Extensive experiments demonstrate that Q-Hawkeye outperforms state-of-the-art methods and generalizes better across multiple datasets. The code and models will be made available.
Problem

Research questions and friction points this paper is trying to address.

Image Quality Assessment
Reinforcement Learning
Predictive Uncertainty
Visual Perception
Reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty-Aware Optimization
Perception-Aware Optimization
Implicit Perception Loss
Reliable Visual Policy
Reinforcement Learning for IQA
🔎 Similar Papers
No similar papers found.
Wulin Xie
Wulin Xie
Institute of Automation, Chinese Academy of Sciences
MLLMMulti-Modal
Rui Dai
Rui Dai
Alibaba Group
machine learning
R
Ruidong Ding
Amap, Alibaba Group, Hangzhou, China
K
Kaikui Liu
Amap, Alibaba Group, Hangzhou, China
X
Xiangxiang Chu
Amap, Alibaba Group, Hangzhou, China
X
Xinwen Hou
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Jie Wen
Jie Wen
Associate Professor, North University of China(NUC)
Quantum ControlPrognostic and Health Management