Beyond Fixed False Discovery Rates: Post-Hoc Conformal Selection with E-Variables

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work proposes a Posterior Conformal Selection (PH-CS) framework that overcomes the rigidity of traditional conformal selection methods, which require a pre-specified false discovery rate (FDR) threshold and thus struggle to balance selection size against FDR control. PH-CS eliminates the need for any preset FDR level by constructing a path of candidate selection sets and estimating their data-driven false discovery proportions (FDPs). Leveraging conformal e-values together with the e-BH procedure, the framework enables users to dynamically choose an optimal operating point based on a custom utility function. The method provides reliable average-case FDP estimates under finite samples, extends naturally to general risk control, and demonstrates competitive FDR control performance while accurately estimating FDP and satisfying utility constraints in both synthetic and real-data experiments.

Technology Category

Application Category

📝 Abstract

Conformal selection (CS) uses calibration data to identify test inputs whose unobserved outcomes are likely to satisfy a pre-specified minimal quality requirement, while controlling the false discovery rate (FDR). Existing methods fix the target FDR level before observing data, which prevents the user from adapting the balance between number of selected test inputs and FDR to downstream needs and constraints based on the available data. For example, in genomics or neuroimaging, researchers often inspect the distribution of test statistics, and decide how aggressively to pursue candidates based on observed evidence strength and available follow-up resources. To address this limitation, we introduce {post-hoc CS} (PH-CS), which generates a path of candidate selection sets, each paired with a data-driven false discovery proportion (FDP) estimate. PH-CS lets the user select any operating point on this path by maximizing a user-specified utility, arbitrarily balancing selection size and FDR. Building on conformal e-variables and the e-Benjamini-Hochberg (e-BH) procedure, PH-CS is proved to provide a finite-sample post-hoc reliability guarantee whereby the ratio between estimated FDP level and true FDP is, on average, upper bounded by $1$, so that the average estimated FDP is, to first order, a valid upper bound on the true FDR. PH-CS is extended to control quality defined in terms of a general risk. Experiments on synthetic and real-world datasets demonstrate that, unlike CS, PH-CS can consistently satisfy user-imposed utility constraints while producing reliable FDP estimates and maintaining competitive FDR control.

Problem

Research questions and friction points this paper is trying to address.

conformal selection

false discovery rate

post-hoc

e-variables

FDR control

Innovation

Methods, ideas, or system contributions that make the work stand out.

post-hoc conformal selection

e-variables

false discovery rate