🤖 AI Summary
AI/ML predictions are frequently misused as ground-truth observations in downstream statistical inference, inducing bias and erroneous conclusions. To address this, we propose a moment-based post-prediction calibration method: leveraging a small gold-standard sample, we model the error structure between predictions and true values—relaxing strong distributional assumptions on prediction errors—to achieve unbiased point estimation. We introduce an analytically tractable scaling factor that preserves prediction uncertainty while ensuring nominal confidence interval coverage. The method requires no iterative optimization or complex modeling, offering both computational efficiency and robustness. Extensive simulations demonstrate that our approach rigorously controls Type I error rates, substantially reduces estimation bias, and achieves confidence coverage close to the nominal level. This provides a reliable inferential framework for AI-augmented statistical analysis.
📝 Abstract
Artificial intelligence (AI) and machine learning (ML) are increasingly used to generate data for downstream analyses, yet naively treating these predictions as true observations can lead to biased results and incorrect inference. Wang et al. (2020) proposed a method, post-prediction inference, which calibrates inference by modeling the relationship between AI/ML-predicted and observed outcomes in a small, gold-standard sample. Since then, several methods have been developed for inference with predicted data. We revisit Wang et al. in light of these recent developments. We reflect on their assumptions and offer a simple extension of their method which relaxes these assumptions. Our extension (1) yields unbiased point estimates under standard conditions and (2) incorporates a simple scaling factor to preserve calibration variability. In extensive simulations, we show that our method maintains nominal Type I error rates, reduces bias, and achieves proper coverage.