🤖 AI Summary
This work addresses the limitations of traditional image quality assessment (IQA), which relies on a single scalar score and struggles to discern subtle differences among high-quality images while offering little interpretability. To overcome these challenges, the study introduces multimodal large language models (MLLMs) into professional-grade IQA for the first time, establishing a new benchmark that leverages comparative judgments and natural language reasoning to differentiate between high-quality image pairs and generate expert-level explanations. Emphasizing a dual capability of “selection + explanation,” the proposed approach transcends conventional IQA paradigms. The benchmark attracted participation from nearly 200 teams and over 2,500 submissions, significantly advancing the field toward interpretable, fine-grained image quality evaluation.
📝 Abstract
In this paper, we present an overview of the NTIRE 2026 challenge on the 3rd Restore Any Image Model in the Wild, specifically focusing on Track 1: Professional Image Quality Assessment. Conventional Image Quality Assessment (IQA) typically relies on scalar scores. By compressing complex visual characteristics into a single number, these methods fundamentally struggle to distinguish subtle differences among uniformly high-quality images. Furthermore, they fail to articulate why one image is superior, lacking the reasoning capabilities required to provide guidance for vision tasks. To bridge this gap, recent advancements in Multimodal Large Language Models (MLLMs) offer a promising paradigm. Inspired by this potential, our challenge establishes a novel benchmark exploring the ability of MLLMs to mimic human expert cognition in evaluating high-quality image pairs. Participants were tasked with overcoming critical bottlenecks in professional scenarios, centering on two primary objectives: (1) Comparative Quality Selection: reliably identifying the visually superior image within a high-quality pair; and (2) Interpretative Reasoning: generating grounded, expert-level explanations that detail the rationale behind the selection. In total, the challenge attracted nearly 200 registrations and over 2,500 submissions. The top-performing methods significantly advanced the state of the art in professional IQA. The challenge dataset is available at https://github.com/narthchin/RAIM-PIQA, and the official homepage is accessible at https://www.codabench.org/competitions/12789/.