Guiding Registration with Emergent Similarity from Pre-Trained Diffusion Models

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Medical image registration suffers from failure of conventional intensity-based similarity metrics (e.g., mutual information, normalized cross-correlation) when anatomical structures are asymmetric—such as present in one image but absent in the other. To address this, we propose an unsupervised, cross-modal robust registration method leveraging implicit semantic features extracted from pretrained diffusion models. We are the first to discover and exploit anatomy-aware semantic representations emerging in intermediate layers of diffusion models (e.g., Stable Diffusion), constructing a differentiable registration loss based on cosine similarity in the learned feature space—requiring neither task-specific supervision nor domain-specific fine-tuning. Integrated into deformable registration frameworks (e.g., VoxelMorph), our approach achieves substantial improvements over conventional methods on both cross-modal 2D registration (DXA/X-ray) and intra-modal 3D registration (brain/non-brain MRI): +12.7% Dice score and −23.4% target registration error. These results demonstrate the efficacy and generalizability of diffusion priors for enforcing anatomical consistency during image alignment.

Technology Category

Application Category

📝 Abstract

Diffusion models, while trained for image generation, have emerged as powerful foundational feature extractors for downstream tasks. We find that off-the-shelf diffusion models, trained exclusively to generate natural RGB images, can identify semantically meaningful correspondences in medical images. Building on this observation, we propose to leverage diffusion model features as a similarity measure to guide deformable image registration networks. We show that common intensity-based similarity losses often fail in challenging scenarios, such as when certain anatomies are visible in one image but absent in another, leading to anatomically inaccurate alignments. In contrast, our method identifies true semantic correspondences, aligning meaningful structures while disregarding those not present across images. We demonstrate superior performance of our approach on two tasks: multimodal 2D registration (DXA to X-Ray) and monomodal 3D registration (brain-extracted to non-brain-extracted MRI). Code: https://github.com/uncbiag/dgir

Problem

Research questions and friction points this paper is trying to address.

Using diffusion models for medical image registration

Addressing failures of intensity-based similarity in registration

Aligning semantic correspondences across different image modalities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pre-trained diffusion models for feature extraction

Guides registration with semantic similarity measures

Outperforms intensity-based methods in challenging scenarios

🔎 Similar Papers

ModeTv2: GPU-accelerated Motion Decomposition Transformer for Pairwise Optimization in Medical Image Registration