🤖 AI Summary
This work addresses the fundamental trade-off in prediction-based inference (PBI): prediction errors from machine learning models induce bias, while reliance solely on small gold-standard samples yields low statistical efficiency. To resolve this, we propose Prediction-powered Inference (PPI), a theoretically grounded correction framework. Methodologically, we introduce the first integrated approach combining double sampling, inverse probability weighting, and asymptotic statistical inference—rigorously ensuring unbiased estimation while improving efficiency. We establish theoretical guarantees of both unbiasedness and reduced asymptotic variance. Empirical evaluation on UK Biobank data demonstrates that our corrected PPI narrows confidence intervals by 18–32% on average compared to standard PPI, substantially enhancing inferential reliability and practical utility. This is the first PPI framework to achieve provable efficiency gains without sacrificing unbiasedness, thereby bridging the methodological gap between classical double-sampling theory and modern predictive inference.
📝 Abstract
From structural biology to epidemiology, predictions from machine learning (ML) models increasingly complement costly gold-standard data to enable faster, more affordable, and scalable scientific inquiry. In response, prediction-based (PB) inference has emerged to accommodate statistical analysis using a large volume of predictions together with a small amount of gold-standard data. The goals of PB inference are two-fold: (i) to mitigate bias from errors in predictions and (ii) to improve efficiency relative to traditional inference using only the gold-standard data. Motwani and Witten (2023) recently revisited two key PB inference approaches and found that only one method, Prediction-powered Inference (PPI) proposed by Angelopoulos et al. (2023), achieves (i). In this paper, we find that PPI does not achieve (ii). We revisit the double sampling literature and show that, with a simple modification, PPI can be adjusted to provide theoretically justified improvements in efficiency. We also contextualize PB inference with economics and statistics literature dating back to the 1960s to highlight the utility of classical methods in this contemporary problem. Our extensive theoretical analyses, along with an analysis of UK Biobank data, indicate that our proposal effectively mitigates bias and improves efficiency, making it preferable for use in practice.