No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference

๐Ÿ“… 2025-05-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper investigates the statistical performance of Prediction-Driven Inference (PPI++) under limited-sample regimes, specifically characterizing when its estimation error degrades relative to inference using only ground-truth labels. Through non-asymptotic analysis and a correlation-driven error decomposition, we derive the first exact finite-sample error bound for PPI++. We establish a โ€œno-free-lunchโ€ theorem: PPI++ improves estimation accuracy if and only if the correlation between pseudo-labels and ground-truth labels exceeds $1/sqrt{n-2}$ in the Gaussian settingโ€”a threshold rigorously derived for both binary and Gaussian cases. We further analyze trade-offs between single-sample and sample-splitting variants of PPI++. All theoretical findings are empirically validated on real-world datasets, confirming that the predicted correlation threshold accurately governs performance gains.

Technology Category

Application Category

๐Ÿ“ Abstract
Prediction-Powered Inference (PPI) is a popular strategy for combining gold-standard and possibly noisy pseudo-labels to perform statistical estimation. Prior work has shown an asymptotic"free lunch"for PPI++, an adaptive form of PPI, showing that the *asymptotic* variance of PPI++ is always less than or equal to the variance obtained from using gold-standard labels alone. Notably, this result holds *regardless of the quality of the pseudo-labels*. In this work, we demystify this result by conducting an exact finite-sample analysis of the estimation error of PPI++ on the mean estimation problem. We give a"no free lunch"result, characterizing the settings (and sample sizes) where PPI++ has provably worse estimation error than using gold-standard labels alone. Specifically, PPI++ will outperform if and only if the correlation between pseudo- and gold-standard is above a certain level that depends on the number of labeled samples ($n$). In some cases our results simplify considerably: For Gaussian data, the correlation must be at least $1/sqrt{n - 2}$ in order to see improvement, and a similar result holds for binary labels. In experiments, we illustrate that our theoretical findings hold on real-world datasets, and give insights into trade-offs between single-sample and sample-splitting variants of PPI++.
Problem

Research questions and friction points this paper is trying to address.

Analyzes finite-sample error of PPI++ mean estimation
Identifies conditions where PPI++ underperforms gold-standard labels
Determines minimum pseudo-label correlation for PPI++ improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Exact finite-sample analysis of PPI++
Correlation threshold determines PPI++ performance
Gaussian and binary label specific results
๐Ÿ”Ž Similar Papers
No similar papers found.
P
Pranav Mani
Abridge AI
P
Peng Xu
Abridge AI
Zachary C. Lipton
Zachary C. Lipton
Raj Reddy Associate Professor of Machine Learning @ Carnegie Mellon; Cofounder & CTO @ Abridge
Machine LearningHealthcareTechnology & SocietyNLPRobustness & Adaptivity
M
Michael Oberst
Abridge AI, Department of Computer Science, Johns Hopkins University