🤖 AI Summary
Traditional conformal prediction often yields overly wide, uninformative prediction sets under few-shot settings. To address this, this paper proposes the first conformal prediction framework integrating synthetic data (e.g., from generative models), guaranteeing finite-sample coverage without distributional assumptions. Its core innovation is a provably sound “score transporter” that enables cross-domain calibration between inconsistent score distributions of real and synthetic data—thereby relaxing the standard i.i.d. requirement. By modeling score inconsistency via empirical quantile mapping, the method achieves distribution-free tightening of confidence sets. Experiments on image classification and tabular regression demonstrate significant reduction in prediction set width and improved informativeness, while strictly maintaining the nominal coverage level.
📝 Abstract
Conformal prediction is a framework for predictive inference with a distribution-free, finite-sample guarantee. However, it tends to provide uninformative prediction sets when calibration data are scarce. This paper introduces Synthetic-powered predictive inference (SPPI), a novel framework that incorporates synthetic data -- e.g., from a generative model -- to improve sample efficiency. At the core of our method is a score transporter: an empirical quantile mapping that aligns nonconformity scores from trusted, real data with those from synthetic data. By carefully integrating the score transporter into the calibration process, SPPI provably achieves finite-sample coverage guarantees without making any assumptions about the real and synthetic data distributions. When the score distributions are well aligned, SPPI yields substantially tighter and more informative prediction sets than standard conformal prediction. Experiments on image classification and tabular regression demonstrate notable improvements in predictive efficiency in data-scarce settings.