🤖 AI Summary
Addressing the challenges of severe class imbalance and open-set classification—where the label space is unknown a priori and novel classes emerge at test time, leading to insufficient coverage and overly conservative predictions—this paper proposes a calibration-preserving prediction set construction method. Methodologically, it (1) introduces a family of statistically valid p-values for detecting whether a test point belongs to a novel class, establishing a theoretical connection to Good–Turing estimation; (2) designs a label-frequency-based selective sample partitioning algorithm; and (3) incorporates a reweighting mechanism to strictly maintain finite-sample calibration. We prove the statistical optimality of the proposed p-values. Experiments on synthetic and real-world datasets demonstrate that the method achieves high coverage over infinitely many potential classes while significantly improving both predictive accuracy and set compactness under extreme class imbalance.
📝 Abstract
This paper presents a conformal prediction method for classification in highly imbalanced and open-set settings, where there are many possible classes and not all may be represented in the data. Existing approaches require a finite, known label space and typically involve random sample splitting, which works well when there is a sufficient number of observations from each class. Consequently, they have two limitations: (i) they fail to provide adequate coverage when encountering new labels at test time, and (ii) they may become overly conservative when predicting previously seen labels. To obtain valid prediction sets in the presence of unseen labels, we compute and integrate into our predictions a new family of conformal p-values that can test whether a new data point belongs to a previously unseen class. We study these p-values theoretically, establishing their optimality, and uncover an intriguing connection with the classical Good--Turing estimator for the probability of observing a new species. To make more efficient use of imbalanced data, we also develop a selective sample splitting algorithm that partitions training and calibration data based on label frequency, leading to more informative predictions. Despite breaking exchangeability, this allows maintaining finite-sample guarantees through suitable re-weighting. With both simulated and real data, we demonstrate our method leads to prediction sets with valid coverage even in challenging open-set scenarios with infinite numbers of possible labels, and produces more informative predictions under extreme class imbalance.