MedDIFT: Multi-Scale Diffusion-Based Correspondence in 3D Medical Imaging

📅 2025-12-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical image registration often fails in low-contrast or anatomically variable regions due to the limited global semantic modeling capability of conventional local intensity-based similarity metrics. To address this, we propose a zero-shot, unsupervised 3D registration framework that requires neither fine-tuning nor task-specific training. Our method leverages multi-scale intermediate activations from a pre-trained medical latent diffusion model as robust voxel-wise descriptors, enabling voxel-level correspondence estimation via cosine similarity. We further enhance stability by integrating a local search prior and controlled noise injection. By bypassing dedicated model training, our approach significantly outperforms classical B-spline registration and achieves accuracy comparable to state-of-the-art learned methods—specifically UniGradICON—on public lung CT datasets. This work establishes a new paradigm for efficient, generalizable, and clinically deployable medical image registration.

Technology Category

Application Category

📝 Abstract
Accurate spatial correspondence between medical images is essential for longitudinal analysis, lesion tracking, and image-guided interventions. Medical image registration methods rely on local intensity-based similarity measures, which fail to capture global semantic structure and often yield mismatches in low-contrast or anatomically variable regions. Recent advances in diffusion models suggest that their intermediate representations encode rich geometric and semantic information. We present MedDIFT, a training-free 3D correspondence framework that leverages multi-scale features from a pretrained latent medical diffusion model as voxel descriptors. MedDIFT fuses diffusion activations into rich voxel-wise descriptors and matches them via cosine similarity, with an optional local-search prior. On a publicly available lung CT dataset, MedDIFT achieves correspondence accuracy comparable to the state-of-the-art learning-based UniGradICON model and surpasses conventional B-spline-based registration, without requiring any task-specific model training. Ablation experiments confirm that multi-level feature fusion and modest diffusion noise improve performance.
Problem

Research questions and friction points this paper is trying to address.

Establishes spatial correspondence between 3D medical images for longitudinal analysis
Addresses failures of intensity-based methods in low-contrast anatomical regions
Leverages diffusion model features without task-specific training requirements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages pretrained diffusion model features
Fuses multi-scale activations into descriptors
Uses cosine similarity for matching
🔎 Similar Papers
No similar papers found.
Xingyu Zhang
Xingyu Zhang
Horizon Robotics Inc
NLP&VLM&AD
A
Anna Reithmeir
School of Computation, Information and Technology, Technical University of Munich; Institute of Machine Learning in Biomedical Imaging, Helmholtz Munich; Munich Center for Machine Learning (MCML)
F
Fryderyk Kögl
School of Computation, Information and Technology, Technical University of Munich; Institute of Machine Learning in Biomedical Imaging, Helmholtz Munich; Munich Center for Machine Learning (MCML); Institute for Diagnostic and Interventional Radiology, Klinkum Rechts der Isar
Rickmer Braren
Rickmer Braren
Technical University Munich
RadiologyQuantitative Image AnalysisArtificial IntelligenceOncologic ImagingPancreatic Cancer
J
Julia A. Schnabel
School of Computation, Information and Technology, Technical University of Munich; Institute of Machine Learning in Biomedical Imaging, Helmholtz Munich; Munich Center for Machine Learning (MCML); School of Biomedical Engineering & Imaging Sciences, King’s College London
Daniel M. Lang
Daniel M. Lang
Helmholtz Munich, Technical University of Munich
medical imagingself-supervised learninganomaly detectiondeep learning