Benchmarking Real-World Medical Image Classification with Noisy Labels: Challenges, Practice, and Outlook

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical image annotation heavily relies on expert knowledge, inevitably introducing label noise and inconsistency; however, existing label-noise learning (LNL) methods lack systematic robustness evaluation in real-world clinical settings. To address this gap, we introduce LNMBench—the first comprehensive benchmark for label noise in medical imaging—encompassing seven datasets, six imaging modalities, and three types of realistic noise patterns. We systematically evaluate ten state-of-the-art LNL methods, revealing substantial performance degradation under high noise levels, severe class imbalance, and cross-domain generalization (average accuracy drop >15%). To mitigate these issues, we propose a lightweight robustness-enhancement strategy. Furthermore, we publicly release a unified evaluation framework and fully reproducible codebase. Our method consistently improves accuracy by 2–5 percentage points across most datasets, advancing standardization and practical deployment of LNL in medical imaging.

Technology Category

Application Category

📝 Abstract
Learning from noisy labels remains a major challenge in medical image analysis, where annotation demands expert knowledge and substantial inter-observer variability often leads to inconsistent or erroneous labels. Despite extensive research on learning with noisy labels (LNL), the robustness of existing methods in medical imaging has not been systematically assessed. To address this gap, we introduce LNMBench, a comprehensive benchmark for Label Noise in Medical imaging. LNMBench encompasses extbf{10} representative methods evaluated across 7 datasets, 6 imaging modalities, and 3 noise patterns, establishing a unified and reproducible framework for robustness evaluation under realistic conditions. Comprehensive experiments reveal that the performance of existing LNL methods degrades substantially under high and real-world noise, highlighting the persistent challenges of class imbalance and domain variability in medical data. Motivated by these findings, we further propose a simple yet effective improvement to enhance model robustness under such conditions. The LNMBench codebase is publicly released to facilitate standardized evaluation, promote reproducible research, and provide practical insights for developing noise-resilient algorithms in both research and real-world medical applications.The codebase is publicly available on https://github.com/myyy777/LNMBench.
Problem

Research questions and friction points this paper is trying to address.

Benchmarks noisy label learning in medical imaging
Assesses robustness of existing methods under realistic noise
Proposes improvement for model robustness in medical data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces LNMBench benchmark for medical label noise
Evaluates 10 methods across diverse datasets and noise patterns
Proposes simple improvement to enhance model robustness
🔎 Similar Papers
No similar papers found.
Y
Yuan Ma
Japan Advanced Institute of Science and Technology, Nomi, Japan
Junlin Hou
Junlin Hou
HKUST | Fudan University
Computer VisionMedical Image AnalysisLabel-efficient Deep LearningeXplainable AI
C
Chao Zhang
University of Toyama, Toyama, Japan
Y
Yukun Zhou
University College London, London, United Kingdom
Z
Zongyuan Ge
Monash University, Melbourne, Australia
H
Haoran Xie
Japan Advanced Institute of Science and Technology, Nomi, Japan
Lie Ju
Lie Ju
University College London; Moorfields Eye Hospital; Monash University
Computer VisionMedical Image AnalysisOphthalmology