🤖 AI Summary
Medical image annotation heavily relies on expert knowledge, inevitably introducing label noise and inconsistency; however, existing label-noise learning (LNL) methods lack systematic robustness evaluation in real-world clinical settings. To address this gap, we introduce LNMBench—the first comprehensive benchmark for label noise in medical imaging—encompassing seven datasets, six imaging modalities, and three types of realistic noise patterns. We systematically evaluate ten state-of-the-art LNL methods, revealing substantial performance degradation under high noise levels, severe class imbalance, and cross-domain generalization (average accuracy drop >15%). To mitigate these issues, we propose a lightweight robustness-enhancement strategy. Furthermore, we publicly release a unified evaluation framework and fully reproducible codebase. Our method consistently improves accuracy by 2–5 percentage points across most datasets, advancing standardization and practical deployment of LNL in medical imaging.
📝 Abstract
Learning from noisy labels remains a major challenge in medical image analysis, where annotation demands expert knowledge and substantial inter-observer variability often leads to inconsistent or erroneous labels. Despite extensive research on learning with noisy labels (LNL), the robustness of existing methods in medical imaging has not been systematically assessed. To address this gap, we introduce LNMBench, a comprehensive benchmark for Label Noise in Medical imaging. LNMBench encompasses extbf{10} representative methods evaluated across 7 datasets, 6 imaging modalities, and 3 noise patterns, establishing a unified and reproducible framework for robustness evaluation under realistic conditions. Comprehensive experiments reveal that the performance of existing LNL methods degrades substantially under high and real-world noise, highlighting the persistent challenges of class imbalance and domain variability in medical data. Motivated by these findings, we further propose a simple yet effective improvement to enhance model robustness under such conditions. The LNMBench codebase is publicly released to facilitate standardized evaluation, promote reproducible research, and provide practical insights for developing noise-resilient algorithms in both research and real-world medical applications.The codebase is publicly available on https://github.com/myyy777/LNMBench.