🤖 AI Summary
This paper addresses the problem of efficiently estimating high-dimensional target densities from sparse pairwise comparison data—arising in expert knowledge elicitation and human feedback learning. We propose a density estimation framework grounded in score matching and temperature scheduling: first, we prove that the gradient of the belief density is collinear with the score vector of the winner density, enabling derivation of a position-dependent analytical temperature field; second, under the Bradley–Terry model, we design a learnable temperature-field estimator and integrate it with score-scaled annealed Langevin dynamics and score-based diffusion sampling to reconstruct the target density. The method achieves high-fidelity recovery of multivariate, complex belief densities using only hundreds to thousands of pairwise comparisons. It demonstrates exceptional efficacy and robustness in the low-sample regime.
📝 Abstract
We study density estimation from pairwise comparisons, motivated by expert knowledge elicitation and learning from human feedback. We relate the unobserved target density to a tempered winner density (marginal density of preferred choices), learning the winner's score via score-matching. This allows estimating the target by `de-tempering' the estimated winner density's score. We prove that the score vectors of the belief and the winner density are collinear, linked by a position-dependent tempering field. We give analytical formulas for this field and propose an estimator for it under the Bradley-Terry model. Using a diffusion model trained on tempered samples generated via score-scaled annealed Langevin dynamics, we can learn complex multivariate belief densities of simulated experts, from only hundreds to thousands of pairwise comparisons.