🤖 AI Summary
Medical image anomaly detection (AD) has long suffered from a lack of fair and comprehensive evaluation benchmarks, hindering reproducibility, comparability, and progress. To address this, we introduce the first unified benchmark for AD, encompassing seven diverse datasets, five imaging modalities, and thirty state-of-the-art methods, with systematic evaluation at both image-level classification and pixel-level segmentation tasks. For the first time, we conduct a component-level analysis of reconstruction-based models (e.g., VAEs, GANs), self-supervised learning approaches (e.g., DINO, MAE), and emerging vision representation methods. Our analysis reveals critical bottlenecks: poor cross-modal generalization, difficulty in localizing anomalies under limited supervision, and suboptimal fine-grained segmentation performance on histopathological slides. We publicly release all data, code, and a standardized evaluation protocol, establishing new state-of-the-art baselines. This benchmark significantly enhances reproducibility, fairness, and rigor, providing a foundational resource for future AD research.
📝 Abstract
Anomaly detection (AD) aims at detecting abnormal samples that deviate from the expected normal patterns. Generally, it can be trained merely on normal data, without a requirement for abnormal samples, and thereby plays an important role in the recognition of rare diseases and health screening in the medical domain. Despite the emergence of numerous methods for medical AD, we observe a lack of a fair and comprehensive evaluation, which causes ambiguous conclusions and hinders the development of this field. To address this problem, this paper builds a benchmark with unified comparison. Seven medical datasets with five image modalities, including chest X-rays, brain MRIs, retinal fundus images, dermatoscopic images, and histopathology whole slide images, are curated for extensive evaluation. Thirty typical AD methods, including reconstruction and self-supervised learning-based methods, are involved in comparison of image-level anomaly classification and pixel-level anomaly segmentation. Furthermore, for the first time, we formally explore the effect of key components in existing methods, clearly revealing unresolved challenges and potential future directions. The datasets and code are available at https://github.com/caiyu6666/MedIAnomaly.