🤖 AI Summary
Medical AI faces critical challenges including data scarcity, high heterogeneity across imaging sources and clinical settings, and narrow evaluation paradigms—leading current research to prioritize marginal benchmark improvements over clinical utility. To address these, we introduce MedMNIST+, the first large-scale, multimodal, clinically aligned standardized medical image benchmark. It encompasses 15 anatomical regions, 8 imaging modalities, and over 300,000 high-quality annotated samples, supporting 15 fundamental vision tasks. Key contributions include: (1) the first systematic integration of heterogeneous, multi-center imaging data with rich clinical metadata; (2) a task-hierarchical protocol and fairness-aware evaluation framework to mitigate dataset bias and overfitting; and (3) a cross-modal normalization preprocessing pipeline. Extensive validation across 20+ state-of-the-art models demonstrates MedMNIST+’s strong discriminative power and robustness. It has emerged as the de facto foundational benchmark for medical AI prototyping and cross-task generalization assessment.