🤖 AI Summary
Under data-driven selection, conventional prediction intervals fail to guarantee marginal coverage for the selected units—compromising reliability for focal samples. Method: We propose the first finite-sample exact coverage framework for post-selection inference, extending Mondrian conformal prediction to multiple test samples and non-equivariant models while accommodating arbitrary permutation-invariant selection rules. Our approach integrates conditional randomization tests, top-K or optimization-driven selection, conformal p-values, and preliminary screening prediction sets to enable efficient computation. Contribution/Results: Evaluated on drug discovery and health risk prediction tasks, our method substantially improves empirical coverage for focal units, ensuring statistically valid inference in real-world decision-making scenarios. This provides the first provably exact finite-sample coverage guarantee for post-selection prediction intervals under general selection mechanisms.
📝 Abstract
Conformal prediction builds marginally valid prediction intervals that cover the unknown outcome of a randomly drawn test point with a prescribed probability. However, in practice, data-driven methods are often used to identify specific test unit(s) of interest, requiring uncertainty quantification tailored to these focal units. In such cases, marginally valid conformal prediction intervals may fail to provide valid coverage for the focal unit(s) due to selection bias. This paper presents a general framework for constructing a prediction set with finite-sample exact coverage, conditional on the unit being selected by a given procedure. The general form of our method accommodates arbitrary selection rules that are invariant to the permutation of the calibration units, and generalizes Mondrian Conformal Prediction to multiple test units and non-equivariant classifiers. We also work out computationally efficient implementation of our framework for a number of realistic selection rules, including top-K selection, optimization-based selection, selection based on conformal p-values, and selection based on properties of preliminary conformal prediction sets. The performance of our methods is demonstrated via applications in drug discovery and health risk prediction.