🤖 AI Summary
This work addresses the challenge that online data-driven selection mechanisms are often asymmetric, rendering existing selective conformal prediction methods unable to guarantee conditional coverage for arbitrary selection rules. To overcome this limitation, the authors propose the PEMI framework, which constructs a permutation set that preserves the observed selection event and calibrates prediction sets over this set. PEMI is the first method to achieve exact finite-sample selective conditional coverage for any asymmetric online selection rule. The framework naturally accommodates offline data fusion, multi-sample selection, and fine-grained selection while enabling error rate control. It integrates Mondrian conformal inference, selection-preserving permutations, Monte Carlo approximation, and covariate-driven selection modeling. Experiments on real-world drug discovery tasks and simulations demonstrate PEMI’s validity across diverse online selection rules and its attainment of theoretically guaranteed exact coverage.
📝 Abstract
Selective conformal prediction aims to construct prediction sets with valid coverage for a test unit conditional on it being selected by a data-driven mechanism. While existing methods in the offline setting handle any selection mechanism that is permutation invariant to the labeled data, their extension to the online setting -- where data arrives sequentially and later decisions depend on earlier ones -- is challenged by the fact that the selection mechanism is naturally asymmetric. As such, existing methods only address a limited collection of selection mechanisms. In this paper, we propose PErmutation-based Mondrian Conformal Inference (PEMI), a general permutation-based framework for selective conformal prediction with arbitrary asymmetric selection rules. Motivated by full and Mondrian conformal prediction, PEMI identifies all permutations of the observed data (or a Monte-Carlo subset thereof) that lead to the same selection event, and calibrates a prediction set using conformity scores over this selection-preserving reference set. Under standard exchangeability conditions, our prediction sets achieve finite-sample exact selection-conditional coverage for any asymmetric selection mechanism and any prediction model. PEMI naturally incorporates additional offline labeled data, extends to selection mechanisms with multiple test samples, and achieves FCR control with fine-grained selection taxonomies. We further work out several efficient instantiations for commonly-used online selection rules, including covariate-based rules, conformal p/e-values-based procedures, and selection based on earlier outcomes. Finally, we demonstrate the efficacy of our methods across various selection rules on a real drug discovery dataset and investigate their performance via simulations.