🤖 AI Summary
This work addresses the challenges of inaccurate predictions and miscalibrated confidence in fine-grained medical image classification, where inter-class differences are subtle and visual appearances ambiguous. To this end, we propose a teacher-guided dual-path multi-prototype retrieval-augmented framework that synergistically combines discriminative classification with similarity-based prototype retrieval. The approach features a plug-and-play dual-path architecture, a multi-prototype memory bank constructed via an exponential moving average (EMA) teacher model, and a conservative confidence-gated fusion mechanism activated only when classification uncertainty coincides with conflicting retrieval evidence. Joint optimization employs cross-entropy and supervised contrastive losses within a cosine-compatible embedding space for prototype matching. Evaluated on HAM10000 and ISIC2019, the method consistently improves average accuracy by 0.21%–2.69% across five backbone networks, with visualizations confirming substantially enhanced discriminability on ambiguous cases.
📝 Abstract
Fine-grained medical image classification is challenged by subtle inter-class variations and visually ambiguous cases, where confidence estimates often exhibit uncertainty rather than being overconfident. In such scenarios, purely discriminative classifiers may achieve high overall accuracy yet still fail to distinguish between highly similar categories, leading to miscalibrated predictions. We propose T-DuMpRa, a teacher-guided dual-path multi-prototype retrieval-augmented framework, where discriminative classification and multi-prototype retrieval jointly drive both training and prediction. During training, we jointly optimize cross-entropy and supervised contrastive objectives to learn a cosine-compatible embedding geometry for reliable prototype matching. We further employ an exponential moving average (EMA) teacher to obtain smoother representations and build a multi-prototype memory bank by clustering teacher embeddings in the teacher embedding space. Our framework is plug-and-play: it can be easily integrated into existing classification models by constructing a compact prototype bank, thereby improving performance on visually ambiguous cases. At inference, we combine the classifier's predicted distribution with a similarity-based distribution computed via cosine matching to prototypes, and apply a conservative confidence-gated fusion that activates retrieval only when the classifier's prediction is uncertain and the retrieval evidence is decisive and conflicting, otherwise keeping confident predictions unchanged. On HAM10000 and ISIC2019, our method yields 0.68%-0.21% and 0.44%-2.69% improvements on 5 different backbones. And visualization analysis proves our model can enhance the model's ability to handle visually ambiguous cases.