Preferences Order, Ratings Anchor: From Fused Expert Aesthetic Ground Truth to Self-Distillation

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This study addresses the lack of systematic analysis on the complementary nature of paired preference and single-point rating annotation protocols in image aesthetic assessment. The authors introduce PPaint, a dual-protocol benchmark comprising expert annotations for 150 Chinese paintings across five aesthetic dimensions, where both annotation types are jointly collected and fused into a unified ground truth. Building upon this, they propose PSDistill, a self-distillation method that leverages an Elo-based reference pool to convert pairwise preferences from a vision-language model (Qwen3-VL-8B) into calibrated pseudo-scores for label-free training. Experiments demonstrate that models trained on a single painting category achieve an average SRCC of 0.709 across three categories, significantly outperforming open-source baselines and approaching the performance of Gemini-3.1-Pro, while also showing strong cross-domain transferability on the APDDv2 benchmark.

📝 Abstract

Pairwise preferences and pointwise ratings are the two dominant annotation protocols in image aesthetic assessment (IAA), yet existing benchmarks adopt only one, leaving their complementarity unmeasured under controlled conditions. We introduce PPaint, a matched dual-protocol benchmark in which 15 domain experts, 5 per category, annotate 150 Chinese paintings under both protocols across five aesthetic dimensions, collecting 45,900 pairwise expert judgments through a locally dense preference design alongside the matched ratings. The matched design reveals complementary strengths: preferences yield more consistent ordinal rankings, while ratings anchor the absolute score scale. Fusing both signals via two independent preference-to-score methods yields a fused expert ground truth on which the two constructions converge to nearly identical scores. The same preference-to-score principle extends to label-free VLM training. PSDistill converts VLM pairwise judgments into calibrated pseudo-scores via an Elo reference pool, and trains the same VLM with confidence-weighted ranking optimization to produce a single-pass aesthetic scorer. Trained on a single painting category, the distilled Qwen3-VL-8B improves mean SRCC from 0.504 to 0.709 across all three categories, outperforming all open-source baselines including the dedicated aesthetic model ArtiMuse and matching closed-source Gemini-3.1-Pro within 0.04 SRCC at single-pass inference cost, with cross-domain transfer further validated on APDDv2. We will release the full PPaint dataset and training code.

Problem

Research questions and friction points this paper is trying to address.

image aesthetic assessment

pairwise preferences

pointwise ratings

annotation protocols

ground truth

Innovation

Methods, ideas, or system contributions that make the work stand out.

pairwise preferences

pointwise ratings

self-distillation