🤖 AI Summary
Addressing the scarcity of pixel-level annotations and high annotation costs in SAR-optical image matching, this paper proposes S2M2-SAR, a semi-supervised multi-scale matching framework. Leveraging a small set of labeled image pairs and abundant unlabeled ones, S2M2-SAR generates pseudo-similarity heatmaps via hierarchical (shallow and deep) feature matching to guide training. It further introduces a cross-modal feature enhancement module and a mutual information minimization loss to disentangle modality-shared representations from modality-specific features. The proposed pseudo-labeling mechanism and multi-scale matching strategy significantly improve cross-modal feature alignment accuracy. On benchmark datasets, S2M2-SAR outperforms existing semi-supervised methods and achieves performance on par with fully supervised state-of-the-art approaches, demonstrating its dual advantages in annotation efficiency and matching robustness.
📝 Abstract
Driven by the complementary nature of optical and synthetic aperture radar (SAR) images, SAR-optical image matching has garnered significant interest. Most existing SAR-optical image matching methods aim to capture effective matching features by employing the supervision of pixel-level matched correspondences within SAR-optical image pairs, which, however, suffers from time-consuming and complex manual annotation, making it difficult to collect sufficient labeled SAR-optical image pairs. To handle this, we design a semi-supervised SAR-optical image matching pipeline that leverages both scarce labeled and abundant unlabeled image pairs and propose a semi-supervised multiscale matching for SAR-optical image matching (S2M2-SAR). Specifically, we pseudo-label those unlabeled SAR-optical image pairs with pseudo ground-truth similarity heatmaps by combining both deep and shallow level matching results, and train the matching model by employing labeled and pseudo-labeled similarity heatmaps. In addition, we introduce a cross-modal feature enhancement module trained using a cross-modality mutual independence loss, which requires no ground-truth labels. This unsupervised objective promotes the separation of modality-shared and modality-specific features by encouraging statistical independence between them, enabling effective feature disentanglement across optical and SAR modalities. To evaluate the effectiveness of S2M2-SAR, we compare it with existing competitors on benchmark datasets. Experimental results demonstrate that S2M2-SAR not only surpasses existing semi-supervised methods but also achieves performance competitive with fully supervised SOTA methods, demonstrating its efficiency and practical potential.