π€ AI Summary
Retinal anomaly detection has long suffered from the absence of comprehensive, publicly available, and clinically relevant benchmarks; existing works are limited by narrow anomaly coverage, test-set saturation, inadequate generalization evaluation, and neglect of multi-source annotations (e.g., negative, positive, and unlabeled samples). To address these gaps, we introduce Retina-ADβthe first systematic, open-source benchmark encompassing multi-center, multimodal, and multi-disease real-world clinical scenarios, supporting unified evaluation under both fully supervised and one-class learning paradigms. We propose a decoupled representation framework (DRA) coupled with a Normal Feature Memory mechanism (NFM-DRA), enabling cross-modal collaborative modeling and robust detection of unseen anomalies for the first time. Our method achieves significant improvements over state-of-the-art approaches across diverse retinal anomalies. Both the codebase and dataset are publicly released to foster fair, reproducible research.
π Abstract
Retinal anomaly detection plays a pivotal role in screening ocular and systemic diseases. Despite its significance, progress in the field has been hindered by the absence of a comprehensive and publicly available benchmark, which is essential for the fair evaluation and advancement of methodologies. Due to this limitation, previous anomaly detection work related to retinal images has been constrained by (1) a limited and overly simplistic set of anomaly types, (2) test sets that are nearly saturated, and (3) a lack of generalization evaluation, resulting in less convincing experimental setups. Furthermore, existing benchmarks in medical anomaly detection predominantly focus on one-class supervised approaches (training only with negative samples), overlooking the vast amounts of labeled abnormal data and unlabeled data that are commonly available in clinical practice. To bridge these gaps, we introduce a benchmark for retinal anomaly detection, which is comprehensive and systematic in terms of data and algorithm. Through categorizing and benchmarking previous methods, we find that a fully supervised approach leveraging disentangled representations of abnormalities (DRA) achieves the best performance but suffers from significant drops in performance when encountering certain unseen anomalies. Inspired by the memory bank mechanisms in one-class supervised learning, we propose NFM-DRA, which integrates DRA with a Normal Feature Memory to mitigate the performance degradation, establishing a new SOTA. The benchmark is publicly available at https://github.com/DopamineLcy/BenchReAD.