🤖 AI Summary
Medical image registration lacks large-scale, unsupervised benchmarks for brain MRI, particularly for evaluating zero-shot generalization. Method: We propose LUMIR—the first unsupervised, zero-shot generalizable benchmark for brain MRI registration—comprising over 4,000 unlabeled T1-weighted images, eliminating reliance on anatomical ground-truth labels. It introduces a systematic evaluation framework covering cross-modality, cross-disease, cross-device, and cross-species generalization. The accompanying end-to-end deep learning framework enforces diffeomorphic constraints, inverse consistency regularization, and self-supervised similarity losses (normalized cross-correlation and mutual information). Results: Our method achieves superior performance over classical optimization-based approaches, attaining Dice scores >0.82 on in-domain registration and outperforming baselines across most zero-shot tasks. The estimated deformation fields are both diffeomorphic and anatomically plausible. LUMIR establishes a new paradigm for developing generalizable, verifiable foundation models in medical image registration.
📝 Abstract
Medical image challenges have played a transformative role in advancing the field, catalyzing algorithmic innovation and establishing new performance standards across diverse clinical applications. Image registration, a foundational task in neuroimaging pipelines, has similarly benefited from the Learn2Reg initiative. Building on this foundation, we introduce the Large-scale Unsupervised Brain MRI Image Registration (LUMIR) challenge, a next-generation benchmark designed to assess and advance unsupervised brain MRI registration. Distinct from prior challenges that leveraged anatomical label maps for supervision, LUMIR removes this dependency by providing over 4,000 preprocessed T1-weighted brain MRIs for training without any label maps, encouraging biologically plausible deformation modeling through self-supervision. In addition to evaluating performance on 590 held-out test subjects, LUMIR introduces a rigorous suite of zero-shot generalization tasks, spanning out-of-domain imaging modalities (e.g., FLAIR, T2-weighted, T2*-weighted), disease populations (e.g., Alzheimer's disease), acquisition protocols (e.g., 9.4T MRI), and species (e.g., macaque brains). A total of 1,158 subjects and over 4,000 image pairs were included for evaluation. Performance was assessed using both segmentation-based metrics (Dice coefficient, 95th percentile Hausdorff distance) and landmark-based registration accuracy (target registration error). Across both in-domain and zero-shot tasks, deep learning-based methods consistently achieved state-of-the-art accuracy while producing anatomically plausible deformation fields. The top-performing deep learning-based models demonstrated diffeomorphic properties and inverse consistency, outperforming several leading optimization-based methods, and showing strong robustness to most domain shifts, the exception being a drop in performance on out-of-domain contrasts.