🤖 AI Summary
Retinal color fundus image registration faces severe scarcity of annotated data, with existing methods predominantly relying on supervised keypoint detection. This paper proposes the first end-to-end unsupervised deep registration framework that eliminates annotation dependency by introducing a novel “descriptor-driven keypoint detection” paradigm: (1) discriminative pixel-level descriptors are learned via unsupervised contrastive learning; (2) a self-consistent quality assessment network guides differentiable keypoint localization. Descriptor quality and matching robustness are jointly optimized. Evaluated on four independent test sets, the learned descriptors surpass state-of-the-art supervised methods; keypoint detection significantly outperforms existing unsupervised approaches; and overall registration accuracy matches top-performing supervised models. Moreover, the framework enables direct cross-modal transfer without fine-tuning.
📝 Abstract
Retinal image registration, particularly for color fundus images, is a challenging yet essential task with diverse clinical applications. Existing registration methods for color fundus images typically rely on keypoints and descriptors for alignment; however, a significant limitation is their reliance on labeled data, which is particularly scarce in the medical domain. In this work, we present a novel unsupervised registration pipeline that entirely eliminates the need for labeled data. Our approach is based on the principle that locations with distinctive descriptors constitute reliable keypoints. This fully inverts the conventional state-of-the-art approach, conditioning the detector on the descriptor rather than the opposite. First, we propose an innovative descriptor learning method that operates without keypoint detection or any labels, generating descriptors for arbitrary locations in retinal images. Next, we introduce a novel, label-free keypoint detector network which works by estimating descriptor performance directly from the input image. We validate our method through a comprehensive evaluation on four hold-out datasets, demonstrating that our unsupervised descriptor outperforms state-of-the-art supervised descriptors and that our unsupervised detector significantly outperforms existing unsupervised detection methods. Finally, our full registration pipeline achieves performance comparable to the leading supervised methods, while not employing any labeled data. Additionally, the label-free nature and design of our method enable direct adaptation to other domains and modalities.