Multivariate Conformal Selection

📅 2025-05-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses high-quality candidate set selection under multivariate responses—e.g., in drug discovery, precision medicine, and LLM alignment—overcoming the limitation of classical conformal prediction, which applies only to univariate outputs. We generalize conformal selection to the multivariate setting for the first time. To ensure finite-sample false discovery rate (FDR) control, we introduce the notion of “region monotonicity” and propose multivariate nonconformity scores. We further develop a unified framework—mCS-dist, incorporating distance-based scoring, and mCS-learn, enabling differentiable learning-based optimization—yielding multivariate conformal p-values that jointly guarantee statistical validity and data adaptivity. Extensive experiments on synthetic and real-world datasets demonstrate that our method significantly improves selection power while strictly maintaining the target FDR, confirming its robustness and practical utility for multivariate selection tasks.

Technology Category

Application Category

📝 Abstract
Selecting high-quality candidates from large datasets is critical in applications such as drug discovery, precision medicine, and alignment of large language models (LLMs). While Conformal Selection (CS) provides rigorous uncertainty quantification, it is limited to univariate responses and scalar criteria. To address this issue, we propose Multivariate Conformal Selection (mCS), a generalization of CS designed for multivariate response settings. Our method introduces regional monotonicity and employs multivariate nonconformity scores to construct conformal p-values, enabling finite-sample False Discovery Rate (FDR) control. We present two variants: mCS-dist, using distance-based scores, and mCS-learn, which learns optimal scores via differentiable optimization. Experiments on simulated and real-world datasets demonstrate that mCS significantly improves selection power while maintaining FDR control, establishing it as a robust framework for multivariate selection tasks.
Problem

Research questions and friction points this paper is trying to address.

Extends conformal selection to multivariate response settings
Controls false discovery rate in high-dimensional data selections
Enhances selection power for applications like drug discovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalizes Conformal Selection for multivariate responses
Introduces regional monotonicity and nonconformity scores
Offers distance-based and learned score variants
🔎 Similar Papers
No similar papers found.
T
Tian Bai
McGill University, Montreal, Canada
Y
Yue Zhao
Department of Mathematics, University of York, York, UK
X
Xiang Yu
MRL, Merck & Co., Inc., Rahway, NJ, USA
Archer Y. Yang
Archer Y. Yang
Department of Mathematics and Statistics, McGill University
Statistical machine learninguncertainty quantificationcomputational statistics