Confidence on the Focal: Conformal Prediction with Selection-Conditional Coverage

📅 2024-03-06
📈 Citations: 7
Influential: 0
📄 PDF
🤖 AI Summary
Under data-driven selection, conventional prediction intervals fail to guarantee marginal coverage for the selected units—compromising reliability for focal samples. Method: We propose the first finite-sample exact coverage framework for post-selection inference, extending Mondrian conformal prediction to multiple test samples and non-equivariant models while accommodating arbitrary permutation-invariant selection rules. Our approach integrates conditional randomization tests, top-K or optimization-driven selection, conformal p-values, and preliminary screening prediction sets to enable efficient computation. Contribution/Results: Evaluated on drug discovery and health risk prediction tasks, our method substantially improves empirical coverage for focal units, ensuring statistically valid inference in real-world decision-making scenarios. This provides the first provably exact finite-sample coverage guarantee for post-selection prediction intervals under general selection mechanisms.

Technology Category

Application Category

📝 Abstract
Conformal prediction builds marginally valid prediction intervals that cover the unknown outcome of a randomly drawn test point with a prescribed probability. However, in practice, data-driven methods are often used to identify specific test unit(s) of interest, requiring uncertainty quantification tailored to these focal units. In such cases, marginally valid conformal prediction intervals may fail to provide valid coverage for the focal unit(s) due to selection bias. This paper presents a general framework for constructing a prediction set with finite-sample exact coverage, conditional on the unit being selected by a given procedure. The general form of our method accommodates arbitrary selection rules that are invariant to the permutation of the calibration units, and generalizes Mondrian Conformal Prediction to multiple test units and non-equivariant classifiers. We also work out computationally efficient implementation of our framework for a number of realistic selection rules, including top-K selection, optimization-based selection, selection based on conformal p-values, and selection based on properties of preliminary conformal prediction sets. The performance of our methods is demonstrated via applications in drug discovery and health risk prediction.
Problem

Research questions and friction points this paper is trying to address.

Ensures valid coverage for selected focal units
Addresses selection bias in conformal prediction
Generalizes to multiple test units and classifiers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditional coverage for selected focal units
Permutation-invariant arbitrary selection rules
Efficient implementation for realistic selection scenarios
🔎 Similar Papers
No similar papers found.
Y
Ying Jin
Data Science Initiative and Department of Health Care Policy, Harvard University
Zhimei Ren
Zhimei Ren
University of Pennsylvania
StatisticsMultiple Hypothesis TestingDistribution-free InferenceData-driven Decision-making