🤖 AI Summary
To address the failure of coverage probability in conformal prediction under data poisoning attacks, this paper proposes RPS—the first provably reliable conformal prediction framework. RPS guarantees strict statistical coverage at any user-specified confidence level, even under worst-case contamination of both training and calibration sets. Its core innovations are: (1) a smoothed scoring function to suppress interference from anomalous scores; (2) a data-partitioning-based model ensembling and subset-wise calibration mechanism; and (3) a majority-voting aggregation strategy for robust prediction set construction. We provide theoretical analysis establishing finite-sample reliability guarantees for RPS. Empirical evaluation on image classification tasks demonstrates that RPS significantly improves poisoning robustness while maintaining high coverage and compact prediction set size—outperforming existing conformal methods under adversarial data corruption.
📝 Abstract
Conformal prediction provides model-agnostic and distribution-free uncertainty quantification through prediction sets that are guaranteed to include the ground truth with any user-specified probability. Yet, conformal prediction is not reliable under poisoning attacks where adversaries manipulate both training and calibration data, which can significantly alter prediction sets in practice. As a solution, we propose reliable prediction sets (RPS): the first efficient method for constructing conformal prediction sets with provable reliability guarantees under poisoning. To ensure reliability under training poisoning, we introduce smoothed score functions that reliably aggregate predictions of classifiers trained on distinct partitions of the training data. To ensure reliability under calibration poisoning, we construct multiple prediction sets, each calibrated on distinct subsets of the calibration data. We then aggregate them into a majority prediction set, which includes a class only if it appears in a majority of the individual sets. Both proposed aggregations mitigate the influence of datapoints in the training and calibration data on the final prediction set. We experimentally validate our approach on image classification tasks, achieving strong reliability while maintaining utility and preserving coverage on clean data. Overall, our approach represents an important step towards more trustworthy uncertainty quantification in the presence of data poisoning.