Adaptive Conformal Prediction for Reliable and Explainable Medical Image Classification

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This study addresses the safety risks posed by overconfident deep learning models in medical image classification, particularly in diagnostically ambiguous scenarios where existing conformal prediction methods often fail to ensure reliable coverage for difficult samples. To overcome this limitation, the authors propose an adaptive Lambda criterion that enhances the Regularized Adaptive Prediction Sets (RAPS) framework by introducing, for the first time, a hierarchical worst-case coverage optimization objective. This approach mitigates the pitfalls of conventional average-efficiency-driven strategies that obscure localized failures. Additionally, the method integrates Grad-CAM attention mechanisms to improve interpretability. Experiments demonstrate that the proposed technique achieves 95.72% global coverage with an average prediction set size of 1.09 on OrganAMNIST, maintaining per-stratum coverage above 90%. Cross-domain validation on PathMNIST further confirms its strong generalization, with attention maps closely aligning with anatomically ambiguous regions and multi-label predictions.

📝 Abstract

Deep learning models for medical imaging often exhibit overconfidence, creating safety risks in ambiguous diagnostic scenarios. While Conformal Prediction (CP) provides distribution-free statistical guarantees, standard methods such as Regularized Adaptive Prediction Sets (RAPS) optimize for average efficiency and can mask severe failures on difficult inputs. We propose an Adaptive Lambda Criterion for RAPS that minimizes the worst-case coverage violation across prediction set size strata. On OrganAMNIST (58,850 abdominal CT images, 11 classes), standard size-optimized RAPS converges to near-deterministic behavior with stratified undercoverage on uncertain samples, while our method achieves 95.72 percent global coverage with average set size 1.09 and at least 90 percent coverage across all strata. Cross-domain validation on PathMNIST (107,180 pathology images, 9 classes) confirms generalizability. Quantitative Grad-CAM analysis (rho = -0.30, p < 1e-22) shows that multi-label predictions correspond to focused attention on anatomically ambiguous regions. These results demonstrate that the proposed method improves reliability while maintaining efficiency, making it suitable for safety-critical medical AI applications.

Problem

Research questions and friction points this paper is trying to address.

Conformal Prediction

Medical Image Classification

Overconfidence

Coverage Violation

Safety-critical AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Conformal Prediction

Worst-case Coverage

Medical Image Classification