🤖 AI Summary
To address the performance limitations imposed by handcrafted feature fusion strategies in deformable medical image registration, this paper proposes an end-to-end learnable cross-scale feature fusion framework. Our method builds a cascaded fusion encoder based on a U-Net variant, integrating differentiable spatial transformers (STNs) and adaptive gating fusion modules to jointly optimize deformation field estimation and multi-scale feature aggregation. An unsupervised mutual information loss is employed, eliminating reliance on ground-truth deformation fields. The core contribution is the first learnable cross-scale fusion mechanism—fully data-driven and free of manual design rules. Evaluated on the LPBA40 and OASIS brain MRI datasets, our approach achieves state-of-the-art performance: a 3.2% improvement in Dice score, a 1.8 mm reduction in Hausdorff distance, and a 40% speedup in inference time compared to prior methods.