Conformal Prediction for Long-Tailed Classification

📅 2025-07-09

📈 Citations: 0

✨ Influential: 0

career value

152K/year

🤖 AI Summary

Real-world classification tasks (e.g., plant identification) often exhibit long-tailed class distributions, making existing conformal prediction methods struggle to simultaneously guarantee class-conditional coverage—especially for rare classes—and maintain compact prediction sets. To address this, we propose a coverage-guaranteed prediction set method tailored for long-tailed distributions. First, we introduce *macro coverage* as a unified metric balancing overall and class-conditional coverage. Second, we design a *prevalence-adjusted softmax scoring function*, incorporating label weighting and prevalence calibration to enable tunable trade-offs between coverage and set size. Third, our method supports continuous interpolation between marginal and class-conditional coverage within the conformal prediction framework. Experiments on Pl@ntNet and iNaturalist demonstrate substantial improvements in rare-class coverage (+12.3%–28.7%), while keeping average prediction set size bounded (≤5). The approach provides rigorous statistical guarantees and practical utility for imbalanced classification.

Technology Category

Application Category

📝 Abstract

Many real-world classification problems, such as plant identification, have extremely long-tailed class distributions. In order for prediction sets to be useful in such settings, they should (i) provide good class-conditional coverage, ensuring that rare classes are not systematically omitted from the prediction sets, and (ii) be a reasonable size, allowing users to easily verify candidate labels. Unfortunately, existing conformal prediction methods, when applied to the long-tailed setting, force practitioners to make a binary choice between small sets with poor class-conditional coverage or sets with very good class-conditional coverage but that are extremely large. We propose methods with guaranteed marginal coverage that smoothly trade off between set size and class-conditional coverage. First, we propose a conformal score function, prevalence-adjusted softmax, that targets a relaxed notion of class-conditional coverage called macro-coverage. Second, we propose a label-weighted conformal prediction method that allows us to interpolate between marginal and class-conditional conformal prediction. We demonstrate our methods on Pl@ntNet and iNaturalist, two long-tailed image datasets with 1,081 and 8,142 classes, respectively.

Problem

Research questions and friction points this paper is trying to address.

Ensuring good class-conditional coverage for rare classes

Balancing prediction set size and coverage in long-tailed distributions

Improving conformal prediction methods for imbalanced datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prevalence-adjusted softmax for macro-coverage

Label-weighted conformal prediction method

Trade-off between set size and coverage

🔎 Similar Papers

No similar papers found.