Benchmarking Real-World Medical Image Classification with Noisy Labels: Challenges, Practice, and Outlook

📅 2025-12-09

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Medical image annotation heavily relies on expert knowledge, inevitably introducing label noise and inconsistency; however, existing label-noise learning (LNL) methods lack systematic robustness evaluation in real-world clinical settings. To address this gap, we introduce LNMBench—the first comprehensive benchmark for label noise in medical imaging—encompassing seven datasets, six imaging modalities, and three types of realistic noise patterns. We systematically evaluate ten state-of-the-art LNL methods, revealing substantial performance degradation under high noise levels, severe class imbalance, and cross-domain generalization (average accuracy drop >15%). To mitigate these issues, we propose a lightweight robustness-enhancement strategy. Furthermore, we publicly release a unified evaluation framework and fully reproducible codebase. Our method consistently improves accuracy by 2–5 percentage points across most datasets, advancing standardization and practical deployment of LNL in medical imaging.

Technology Category

Application Category

📝 Abstract

Learning from noisy labels remains a major challenge in medical image analysis, where annotation demands expert knowledge and substantial inter-observer variability often leads to inconsistent or erroneous labels. Despite extensive research on learning with noisy labels (LNL), the robustness of existing methods in medical imaging has not been systematically assessed. To address this gap, we introduce LNMBench, a comprehensive benchmark for Label Noise in Medical imaging. LNMBench encompasses extbf{10} representative methods evaluated across 7 datasets, 6 imaging modalities, and 3 noise patterns, establishing a unified and reproducible framework for robustness evaluation under realistic conditions. Comprehensive experiments reveal that the performance of existing LNL methods degrades substantially under high and real-world noise, highlighting the persistent challenges of class imbalance and domain variability in medical data. Motivated by these findings, we further propose a simple yet effective improvement to enhance model robustness under such conditions. The LNMBench codebase is publicly released to facilitate standardized evaluation, promote reproducible research, and provide practical insights for developing noise-resilient algorithms in both research and real-world medical applications.The codebase is publicly available on https://github.com/myyy777/LNMBench.

Problem

Research questions and friction points this paper is trying to address.

Benchmarks noisy label learning in medical imaging

Assesses robustness of existing methods under realistic noise

Proposes improvement for model robustness in medical data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces LNMBench benchmark for medical label noise

Evaluates 10 methods across diverse datasets and noise patterns

Proposes simple improvement to enhance model robustness

🔎 Similar Papers

LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models