Dodgersort: Uncertainty-Aware VLM-Guided Human-in-the-Loop Pairwise Ranking

๐Ÿ“… 2026-03-21
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the high cost (quadratic complexity) and limited reliability of traditional pairwise comparison labeling by proposing an active ranking framework that integrates vision-language models with human feedback. The approach uniquely combines CLIP-based hierarchical pre-ranking, a neural ranking head, probabilistic ensemble modeling via Elo, Bradleyโ€“Terryโ€“Luce (BTL), and Gaussian Process (GP) formulations, decomposition of cognitive and aleatoric uncertainty, and an information-gain-driven sample selection strategy. This integration enables, for the first time, synergistic optimization through neural adaptation, multi-model uncertainty ensembling, and information-theoretic guidance. Evaluated on medical imaging, historical dating, and aesthetic ranking tasks, the method reduces annotation effort by 11โ€“16% while significantly improving inter-rater agreement. On FG-NET, it achieves 5โ€“20ร— higher information gain per pairwise comparison than baselines, attaining Pareto-optimal trade-offs between accuracy and efficiency.

Technology Category

Application Category

๐Ÿ“ Abstract
Pairwise comparison labeling is emerging as it yields higher inter-rater reliability than conventional classification labeling, but exhaustive comparisons require quadratic cost. We propose Dodgersort, which leverages CLIP-based hierarchical pre-ordering, a neural ranking head and probabilistic ensemble (Elo, BTL, GP), epistemic--aleatoric uncertainty decomposition, and information-theoretic pair selection. It reduces human comparisons while improving the reliability of the rankings. In visual ranking tasks in medical imaging, historical dating, and aesthetics, Dodgersort achieves a 11--16\% annotation reduction while improving inter-rater reliability. Cross-domain ablations across four datasets show that neural adaptation and ensemble uncertainty are key to this gain. In FG-NET with ground-truth ages, the framework extracts 5--20$\times$ more ranking information per comparison than baselines, yielding Pareto-optimal accuracy--efficiency trade-offs.
Problem

Research questions and friction points this paper is trying to address.

pairwise comparison
annotation efficiency
ranking reliability
human-in-the-loop
uncertainty-aware
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty-aware ranking
Human-in-the-loop
Pairwise comparison
Neural ensemble
Information-theoretic selection
๐Ÿ”Ž Similar Papers
No similar papers found.