🤖 AI Summary
This study investigates how data augmentation strategies affect the performance of conformal prediction in diabetic retinopathy (DR) grading, specifically evaluating their impact on key uncertainty quantification metrics—empirical coverage, prediction set size, and efficiency. Using the DDR dataset and backbone models ResNet-50 and CoaT, we systematically assess five augmentation strategies: no augmentation, geometric transformations, CLAHE, Mixup, and CutMix. Results show that Mixup and CutMix significantly improve both coverage reliability and predictive efficiency, whereas CLAHE may impair confidence calibration. Notably, this work is the first to demonstrate that mixed augmentations not only enhance classification accuracy but also jointly optimize the statistical validity of conformal prediction. These findings provide empirical evidence and methodological guidance for the co-design of augmentation techniques and uncertainty quantification in trustworthy medical AI systems.
📝 Abstract
The clinical deployment of deep learning models for high-stakes tasks such as diabetic retinopathy (DR) grading requires demonstrable reliability. While models achieve high accuracy, their clinical utility is limited by a lack of robust uncertainty quantification. Conformal prediction (CP) offers a distribution-free framework to generate prediction sets with statistical guarantees of coverage. However, the interaction between standard training practices like data augmentation and the validity of these guarantees is not well understood. In this study, we systematically investigate how different data augmentation strategies affect the performance of conformal predictors for DR grading. Using the DDR dataset, we evaluate two backbone architectures -- ResNet-50 and a Co-Scale Conv-Attentional Transformer (CoaT) -- trained under five augmentation regimes: no augmentation, standard geometric transforms, CLAHE, Mixup, and CutMix. We analyze the downstream effects on conformal metrics, including empirical coverage, average prediction set size, and correct efficiency. Our results demonstrate that sample-mixing strategies like Mixup and CutMix not only improve predictive accuracy but also yield more reliable and efficient uncertainty estimates. Conversely, methods like CLAHE can negatively impact model certainty. These findings highlight the need to co-design augmentation strategies with downstream uncertainty quantification in mind to build genuinely trustworthy AI systems for medical imaging.