🤖 AI Summary
This paper addresses two key limitations in conformal prediction: inaccurate difficulty-aware calibration and the inability of prediction set sizes to adapt to sample difficulty. To resolve these, we propose a difficulty-aware binning method based on input transformation—using differentiable input perturbations to generate reliable difficulty rankings, coupled with uniform-mass binning for stable grouping. We further introduce two novel adaptive evaluation metrics: difficulty calibration error and conditional coverage deviation, and design a conformal prediction algorithm that employs group-specific conditional thresholds. Evaluated on ImageNet image classification and medical visual acuity prediction tasks, our method significantly improves adaptivity: it reduces average prediction set size by 12.7%–23.4% while maintaining strict marginal coverage guarantees, and achieves more balanced conditional coverage within difficulty groups—outperforming existing conformal prediction baselines.
📝 Abstract
Conformal prediction constructs a set of labels instead of a single point prediction, while providing a probabilistic coverage guarantee. Beyond the coverage guarantee, adaptiveness to example difficulty is an important property. It means that the method should produce larger prediction sets for more difficult examples, and smaller ones for easier examples. Existing evaluation methods for adaptiveness typically analyze coverage rate violation or average set size across bins of examples grouped by difficulty. However, these approaches often suffer from imbalanced binning, which can lead to inaccurate estimates of coverage or set size. To address this issue, we propose a binning method that leverages input transformations to sort examples by difficulty, followed by uniform-mass binning. Building on this binning, we introduce two metrics to better evaluate adaptiveness. These metrics provide more reliable estimates of coverage rate violation and average set size due to balanced binning, leading to more accurate adaptivity assessment. Through experiments, we demonstrate that our proposed metric correlates more strongly with the desired adaptiveness property compared to existing ones. Furthermore, motivated by our findings, we propose a new adaptive prediction set algorithm that groups examples by estimated difficulty and applies group-conditional conformal prediction. This allows us to determine appropriate thresholds for each group. Experimental results on both (a) an Image Classification (ImageNet) (b) a medical task (visual acuity prediction) show that our method outperforms existing approaches according to the new metrics.