🤖 AI Summary
This study addresses the critical yet often overlooked issue of annotation noise in medical image classification, where conventional noise-robust methods prioritize overall accuracy while neglecting the severe clinical consequences of high-risk misdiagnoses such as false negatives. For the first time, it systematically evaluates prominent robust learning approaches—including Co-teaching, DivideMix, UNICON, and GMM filtering—from a clinical safety perspective on DermaMNIST and PathMNIST benchmarks. The work proposes integrating cost-sensitive optimization into noise-robust training by incorporating a global risk metric that imposes substantially higher penalties on false negatives. Experimental results demonstrate that existing methods fail to ensure clinical safety, whereas the proposed paradigm effectively reduces high-risk diagnostic errors without compromising overall model performance.
📝 Abstract
Noisy labels are a pervasive challenge in medical image classification, where annotation errors arise from inter-observer variability and diagnostic ambiguity. Although several noise-robust learning methods have been proposed, their evaluation predominantly relies on accuracy-oriented metrics, overlooking the clinical implications of asymmetric error costs. In medical diagnosis, a false negative (missed disease) carries substantially higher consequences than a false positive (false alarm), as delayed treatment can directly impact patient outcomes. In this work, we investigate whether noise-robust training methods preserve clinical safety under label noise. We conduct a systematic risk-aware evaluation of the state-of-the-art noise-robust methods Coteaching, DivideMix, UNICON, and a GMM-based filtering approach on binarized DermaMNIST and PathMNIST datasets under clean and label noise rates of 20%, and 40%. Beyond balanced accuracy, we adopt a cost-sensitive Global Risk formulation that explicitly penalizes false negatives. Our analysis reveals that the robustness of state-of-the-art methods does not guarantee clinical safety. Furthermore, we demonstrate that integrating cost-sensitive optimization into noise-robust training significantly reduces clinical risk, while mantaining model utility. These findings demonstrate that noise-robust learning must be evaluated through a clinical risk lens, and that combining robust training with cost-sensitive optimization can meaningfully reduce risk in noisy-label medical imaging scenarios.