🤖 AI Summary
To address the pixel-level registration challenge between SAR and optical images—arising from fundamental differences in their imaging mechanisms—this paper proposes a dense cross-modal registration framework integrating structural gradient priors with deep features. Methodologically: (1) a Feature Gradient Enhancement (FGE) module explicitly embeds multi-scale, multi-directional gradient information into deep feature representations to improve cross-modal discriminability; (2) a Global-Local Affine Flow Matcher (GLAM) jointly models global affine transformations and refines local optical flow, balancing structural consistency and local accuracy. The framework adopts an end-to-end coarse-to-fine architecture incorporating attention mechanisms, feature reconstruction, and multi-scale gradient filtering. Evaluated on SEN1-2 and GFGE_SO datasets, our method achieves CMR@1px improvements of 12.29% and 18.50%, respectively, significantly outperforming state-of-the-art approaches. It demonstrates strong robustness and cross-scene generalization capability.
📝 Abstract
Achieving pixel-level registration between SAR and optical images remains a challenging task due to their fundamentally different imaging mechanisms and visual characteristics. Although deep learning has achieved great success in many cross-modal tasks, its performance on SAR-Optical registration tasks is still unsatisfactory. Gradient-based information has traditionally played a crucial role in handcrafted descriptors by highlighting structural differences. However, such gradient cues have not been effectively leveraged in deep learning frameworks for SAR-Optical image matching. To address this gap, we propose SOMA, a dense registration framework that integrates structural gradient priors into deep features and refines alignment through a hybrid matching strategy. Specifically, we introduce the Feature Gradient Enhancer (FGE), which embeds multi-scale, multi-directional gradient filters into the feature space using attention and reconstruction mechanisms to boost feature distinctiveness. Furthermore, we propose the Global-Local Affine-Flow Matcher (GLAM), which combines affine transformation and flow-based refinement within a coarse-to-fine architecture to ensure both structural consistency and local accuracy. Experimental results demonstrate that SOMA significantly improves registration precision, increasing the CMR@1px by 12.29% on the SEN1-2 dataset and 18.50% on the GFGE_SO dataset. In addition, SOMA exhibits strong robustness and generalizes well across diverse scenes and resolutions.