đ¤ AI Summary
This paper addresses the lack of finite-sample coverage guarantees for uncertainty quantification in multi-output settings. We propose the first Multivariate Conformal Prediction Distribution (MCPD) with rigorous finite-sample calibration. Methodologically, we innovatively integrate optimal transport theory with the conformal prediction framework: vector ranks and quantile regions are defined via transport maps; piecewise-constant assignment coupled with multivariate weighted quantile computation yields an analytically tractable predictive distribution. Our contributions are threefold: (1) We extend conformal prediction beyond univariate scores or scalar outputs to general multivariate outputs; (2) All derived uncertainty regionsâregardless of shape or constructionâautomatically satisfy the user-specified marginal coverage probability; (3) We provide two practical implementationsâconservative deterministic and exact randomizedâthereby generalizing the DempsterâHill procedure to multivariate settings for the first time.
đ Abstract
Conformal prediction (CP) constructs uncertainty sets for model outputs with finite-sample coverage guarantees. A candidate output is included in the prediction set if its non-conformity score is not considered extreme relative to the scores observed on a set of calibration examples. However, this procedure is only straightforward when scores are scalar-valued, which has limited CP to real-valued scores or ad-hoc reductions to one dimension. The problem of ordering vectors has been studied via optimal transport (OT), which provides a principled method for defining vector-ranks and multivariate quantile regions, though typically with only asymptotic coverage guarantees. We restore finite-sample, distribution-free coverage by conformalizing the vector-valued OT quantile region. Here, a candidate's rank is defined via a transport map computed for the calibration scores augmented with that candidate's score. This defines a continuum of OT problems for which we prove that the resulting optimal assignment is piecewise-constant across a fixed polyhedral partition of the score space. This allows us to characterize the entire prediction set tractably, and provides the machinery to address a deeper limitation of prediction sets: that they only indicate which outcomes are plausible, but not their relative likelihood. In one dimension, conformal predictive distributions (CPDs) fill this gap by producing a predictive distribution with finite-sample calibration. Extending CPDs beyond one dimension remained an open problem. We construct, to our knowledge, the first multivariate CPDs with finite-sample calibration, i.e., they define a valid multivariate distribution where any derived uncertainty region automatically has guaranteed coverage. We present both conservative and exact randomized versions, the latter resulting in a multivariate generalization of the classical Dempster-Hill procedure.