🤖 AI Summary
This work addresses the challenge of high-dimensional noise estimation, where conventional debiasing methods often yield inflated variance and overly wide confidence intervals due to inaccurate bias estimation. The authors propose an empirical Bayes re-biasing strategy that models bias as a learnable random effect and estimates its unknown distribution via nonparametric maximum likelihood. Starting from a fully debiased estimator, the method data-adaptively reintroduces an appropriate amount of bias. This approach substantially shortens confidence intervals while maintaining calibrated coverage probabilities and provides theoretical guarantees on the convergence rate of coverage error. Empirical results demonstrate significant improvements in prediction-augmented inference tasks, including pairwise win-rate evaluation in large language models and estimation of direct genetic effects in family-based genome-wide association studies (GWAS).
📝 Abstract
We study methods for simultaneous analysis of many noisy and biased estimates, each paired with an even noisier estimate of its own bias. The analyst's goal is to construct short calibrated intervals for each parameter. The standard debiasing approach, which subtracts the bias estimate from each biased estimate, inflates variance and yields long intervals. In this paper, we propose an empirical Bayes rebiasing strategy that starts from the fully debiased estimates and learns from data how much bias to reintroduce by estimating the unknown bias distribution. We provide convergence rates for the coverage of our intervals when the bias distribution is estimated using nonparametric maximum likelihood. Furthermore, we demonstrate substantial precision gains in prediction-powered inference, including pairwise LLM win-rate evaluations, as well as for inference of direct genetic effects in family-based GWAS.