🤖 AI Summary
This work addresses the problem of excessively large and impractical prediction sets in uncertainty quantification for deep classifiers. Instead of conventional post-hoc calibration, we propose embedding conformal prediction (CP)’s validity constraints directly into the training process. We formulate CP as a bilevel optimization problem for the first time: the upper-level objective minimizes prediction set size, while the lower-level adaptively learns the quantile threshold. Based on this formulation, we design Deep Prediction Set Minimization (DPSM), an end-to-end trainable algorithm. We theoretically establish a learning bound of $O(1/sqrt{n})$, improving upon existing stochastic approximation methods. Empirically, DPSM reduces average prediction set size by 20.46% across multiple benchmark datasets and state-of-the-art deep models, while maintaining rigorous statistical validity and demonstrating strong practical effectiveness.
📝 Abstract
Conformal prediction (CP) is a promising uncertainty quantification framework which works as a wrapper around a black-box classifier to construct prediction sets (i.e., subset of candidate classes) with provable guarantees. However, standard calibration methods for CP tend to produce large prediction sets which makes them less useful in practice. This paper considers the problem of integrating conformal principles into the training process of deep classifiers to directly minimize the size of prediction sets. We formulate conformal training as a bilevel optimization problem and propose the {em Direct Prediction Set Minimization (DPSM)} algorithm to solve it. The key insight behind DPSM is to minimize a measure of the prediction set size (upper level) that is conditioned on the learned quantile of conformity scores (lower level). We analyze that DPSM has a learning bound of $O(1/sqrt{n})$ (with $n$ training samples), while prior conformal training methods based on stochastic approximation for the quantile has a bound of $Omega(1/s)$ (with batch size $s$ and typically $s ll sqrt{n}$). Experiments on various benchmark datasets and deep models show that DPSM significantly outperforms the best prior conformal training baseline with $20.46%downarrow$ in the prediction set size and validates our theory.