Direct Prediction Set Minimization via Bilevel Conformal Classifier Training

📅 2025-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of excessively large and impractical prediction sets in uncertainty quantification for deep classifiers. Instead of conventional post-hoc calibration, we propose embedding conformal prediction (CP)’s validity constraints directly into the training process. We formulate CP as a bilevel optimization problem for the first time: the upper-level objective minimizes prediction set size, while the lower-level adaptively learns the quantile threshold. Based on this formulation, we design Deep Prediction Set Minimization (DPSM), an end-to-end trainable algorithm. We theoretically establish a learning bound of $O(1/sqrt{n})$, improving upon existing stochastic approximation methods. Empirically, DPSM reduces average prediction set size by 20.46% across multiple benchmark datasets and state-of-the-art deep models, while maintaining rigorous statistical validity and demonstrating strong practical effectiveness.

Technology Category

Application Category

📝 Abstract
Conformal prediction (CP) is a promising uncertainty quantification framework which works as a wrapper around a black-box classifier to construct prediction sets (i.e., subset of candidate classes) with provable guarantees. However, standard calibration methods for CP tend to produce large prediction sets which makes them less useful in practice. This paper considers the problem of integrating conformal principles into the training process of deep classifiers to directly minimize the size of prediction sets. We formulate conformal training as a bilevel optimization problem and propose the {em Direct Prediction Set Minimization (DPSM)} algorithm to solve it. The key insight behind DPSM is to minimize a measure of the prediction set size (upper level) that is conditioned on the learned quantile of conformity scores (lower level). We analyze that DPSM has a learning bound of $O(1/sqrt{n})$ (with $n$ training samples), while prior conformal training methods based on stochastic approximation for the quantile has a bound of $Omega(1/s)$ (with batch size $s$ and typically $s ll sqrt{n}$). Experiments on various benchmark datasets and deep models show that DPSM significantly outperforms the best prior conformal training baseline with $20.46%downarrow$ in the prediction set size and validates our theory.
Problem

Research questions and friction points this paper is trying to address.

Minimize prediction set size in conformal classifiers
Integrate conformal principles into deep classifier training
Improve efficiency of uncertainty quantification in predictions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bilevel optimization for conformal training
Direct Prediction Set Minimization algorithm
Minimizes prediction set size with guarantees
🔎 Similar Papers
No similar papers found.
Y
Yuanjie Shi
School of Electrical Engineering and Computer Science, Washington State University, Pullman, Washington, USA
Hooman Shahrokhi
Hooman Shahrokhi
Washington State University
Large Language ModelsConformal Prediction
X
Xuesong Jia
School of Electrical Engineering and Computer Science, Washington State University, Pullman, Washington, USA
X
Xiongzhi Chen
Department of Mathematics and Statistics, Washington State University, Pullman, Washington, USA
J
J. Doppa
School of Electrical Engineering and Computer Science, Washington State University, Pullman, Washington, USA
Y
Yan Yan
School of Electrical Engineering and Computer Science, Washington State University, Pullman, Washington, USA