🤖 AI Summary
Addressing the challenging cross-modal registration of optical and SAR imagery under large geometric transformations—exacerbated by nonlinear radiometric discrepancies, severe geometric distortions, and heterogeneous noise—this paper proposes GDROS, a geometry-guided dense registration framework. Methodologically, GDROS employs a hybrid CNN-Transformer network to extract robust cross-modal features and constructs an iterative multi-scale 4D correlation volume for dense correspondence estimation. Crucially, it introduces a least-squares affine regression module that imposes explicit geometric constraints on the dense optical flow field, effectively suppressing prediction divergence in highly deformable scenes. Extensive experiments on WHU-Opt-SAR, OS, and UBCv2 benchmarks demonstrate that GDROS consistently outperforms state-of-the-art methods across varying spatial resolutions, achieving superior quantitative accuracy (e.g., lower RMSE, higher success rate) and qualitatively precise alignment.
📝 Abstract
Registration of optical and synthetic aperture radar (SAR) remote sensing images serves as a critical foundation for image fusion and visual navigation tasks. This task is particularly challenging because of their modal discrepancy, primarily manifested as severe nonlinear radiometric differences (NRD), geometric distortions, and noise variations. Under large geometric transformations, existing classical template-based and sparse keypoint-based strategies struggle to achieve reliable registration results for optical-SAR image pairs. To address these limitations, we propose GDROS, a geometry-guided dense registration framework leveraging global cross-modal image interactions. First, we extract cross-modal deep features from optical and SAR images through a CNN-Transformer hybrid feature extraction module, upon which a multi-scale 4D correlation volume is constructed and iteratively refined to establish pixel-wise dense correspondences. Subsequently, we implement a least squares regression (LSR) module to geometrically constrain the predicted dense optical flow field. Such geometry guidance mitigates prediction divergence by directly imposing an estimated affine transformation on the final flow predictions. Extensive experiments have been conducted on three representative datasets WHU-Opt-SAR dataset, OS dataset, and UBCv2 dataset with different spatial resolutions, demonstrating robust performance of our proposed method across different imaging resolutions. Qualitative and quantitative results show that GDROS significantly outperforms current state-of-the-art methods in all metrics. Our source code will be released at: https://github.com/Zi-Xuan-Sun/GDROS.