π€ AI Summary
This work addresses the inefficiency and multiple-testing burden inherent in traditional Bayesian conformal optimization, which couples predictive set selection and coverage validation on the same data. The authors propose Decoupled Conformal Optimization (DCO), a novel framework that separates tuning and calibration phases: structural choices are optimized on an independent tuning set to enhance efficiency, while conformal quantiles are computed on a fresh calibration set. This approach guarantees finite-sample marginal coverage for any candidate class without requiring confidence parameters or multiple hypothesis testing corrections. Empirical evaluations on benchmarks including ImageNet-A, CIFAR-100, and Diabetes demonstrate that DCO closely attains nominal coverage while significantly reducing prediction set sizesβfor instance, lowering the average set size on ImageNet-A from 26.52 to 25.26.
π Abstract
Bayesian conformal optimisation methods often use the same held-out data both to search for efficient prediction sets and to certify coverage or risk. This coupling is natural for high-probability risk-control guarantees, but it is not necessary when the target is standard finite-sample marginal conformal coverage. We propose Decoupled Conformal Optimisation (DCO), a train-tune-calibrate design principle that uses an independent tuning split for efficiency-oriented structural selection and a fresh calibration split for the final conformal quantile. Conditional on the tuned structure, standard split-conformal exchangeability yields finite-sample marginal coverage for any candidate class, without a confidence parameter or multiple-testing correction. DCO therefore targets a different finite-sample guarantee from PAC-style methods: marginal conformal coverage rather than high-probability risk control. Under consistency assumptions on the coupled risk bound, the two approaches nevertheless converge to the same population threshold. Across classification and regression benchmarks, including ImageNet-A, CIFAR-100, Diabetes, California Housing, and Concrete, DCO tracks the nominal coverage level closely while often reducing average prediction-set size or interval width relative to PAC-style calibration. On ImageNet-A, for example, the average set size decreases from $26.52$ to $25.26$ and the 95th-percentile set size from $58.95$ to $53.73$; on Diabetes, the average interval width decreases from $2.098$ to $1.914$.