🤖 AI Summary
In long-tailed label distributions, conventional conformal prediction (CP) methods achieve marginal coverage but suffer from severe calibration imbalance—over-covering head classes while under-covering tail classes. To address this, we propose Tail-Aware Conformal Prediction (TACP) and its smoothed variant sTACP. Methodologically, TACP introduces a class-frequency-aware non-uniform score adjustment and a learnable reweighting mechanism to dynamically calibrate prediction sets. We theoretically establish that TACP significantly reduces the coverage gap between head and tail classes. Extensive experiments on multiple long-tailed benchmarks demonstrate that TACP/sTACP maintains nominal marginal coverage while substantially improving tail-class coverage—enhancing reliability for minority-class predictions. Our framework provides a novel paradigm for fair and robust uncertainty quantification under long-tailed distributions.
📝 Abstract
Conformal Prediction (CP) is a popular method for uncertainty quantification that converts a pretrained model's point prediction into a prediction set, with the set size reflecting the model's confidence. Although existing CP methods are guaranteed to achieve marginal coverage, they often exhibit imbalanced coverage across classes under long-tail label distributions, tending to over cover the head classes at the expense of under covering the remaining tail classes. This under coverage is particularly concerning, as it undermines the reliability of the prediction sets for minority classes, even with coverage ensured on average. In this paper, we propose the Tail-Aware Conformal Prediction (TACP) method to mitigate the under coverage of the tail classes by utilizing the long-tail structure and narrowing the head-tail coverage gap. Theoretical analysis shows that it consistently achieves a smaller head-tail coverage gap than standard methods. To further improve coverage balance across all classes, we introduce an extension of TACP: soft TACP (sTACP) via a reweighting mechanism. The proposed framework can be combined with various non-conformity scores, and experiments on multiple long-tail benchmark datasets demonstrate the effectiveness of our methods.