๐ค AI Summary
This work addresses the high cost (quadratic complexity) and limited reliability of traditional pairwise comparison labeling by proposing an active ranking framework that integrates vision-language models with human feedback. The approach uniquely combines CLIP-based hierarchical pre-ranking, a neural ranking head, probabilistic ensemble modeling via Elo, BradleyโTerryโLuce (BTL), and Gaussian Process (GP) formulations, decomposition of cognitive and aleatoric uncertainty, and an information-gain-driven sample selection strategy. This integration enables, for the first time, synergistic optimization through neural adaptation, multi-model uncertainty ensembling, and information-theoretic guidance. Evaluated on medical imaging, historical dating, and aesthetic ranking tasks, the method reduces annotation effort by 11โ16% while significantly improving inter-rater agreement. On FG-NET, it achieves 5โ20ร higher information gain per pairwise comparison than baselines, attaining Pareto-optimal trade-offs between accuracy and efficiency.
๐ Abstract
Pairwise comparison labeling is emerging as it yields higher inter-rater reliability than conventional classification labeling, but exhaustive comparisons require quadratic cost. We propose Dodgersort, which leverages CLIP-based hierarchical pre-ordering, a neural ranking head and probabilistic ensemble (Elo, BTL, GP), epistemic--aleatoric uncertainty decomposition, and information-theoretic pair selection. It reduces human comparisons while improving the reliability of the rankings. In visual ranking tasks in medical imaging, historical dating, and aesthetics, Dodgersort achieves a 11--16\% annotation reduction while improving inter-rater reliability. Cross-domain ablations across four datasets show that neural adaptation and ensemble uncertainty are key to this gain. In FG-NET with ground-truth ages, the framework extracts 5--20$\times$ more ranking information per comparison than baselines, yielding Pareto-optimal accuracy--efficiency trade-offs.