🤖 AI Summary
Infrared–visible image fusion suffers from degraded quality due to misalignment during acquisition, yet existing methods struggle to achieve high-fidelity unsupervised registration and fusion due to the absence of ground-truth aligned pairs. To address this, we propose the first self-supervised bidirectional self-registration framework that requires no ground-truth annotations. Our approach introduces a proxy data generator and its inverse to synthesize pseudo-global disparities; incorporates a neighborhood-aware dynamic alignment loss to mitigate cross-modal discrepancies; and enforces global–local disparity consistency alongside cross-modal edge alignment. Extensive experiments on multiple misaligned datasets demonstrate significant improvements in both registration accuracy and fusion quality, consistently outperforming state-of-the-art supervised and unsupervised methods. The source code will be made publicly available.
📝 Abstract
Acquiring accurately aligned multi-modal image pairs is fundamental for achieving high-quality multi-modal image fusion. To address the lack of ground truth in current multi-modal image registration and fusion methods, we propose a novel self-supervised extbf{B}i-directional extbf{S}elf- extbf{R}egistration framework ( extbf{B-SR}). Specifically, B-SR utilizes a proxy data generator (PDG) and an inverse proxy data generator (IPDG) to achieve self-supervised global-local registration. Visible-infrared image pairs with spatially misaligned differences are aligned to obtain global differences through the registration module. The same image pairs are processed by PDG, such as cropping, flipping, stitching, etc., and then aligned to obtain local differences. IPDG converts the obtained local differences into pseudo-global differences, which are used to perform global-local difference consistency with the global differences. Furthermore, aiming at eliminating the effect of modal gaps on the registration module, we design a neighborhood dynamic alignment loss to achieve cross-modal image edge alignment. Extensive experiments on misaligned multi-modal images demonstrate the effectiveness of the proposed method in multi-modal image alignment and fusion against the competing methods. Our code will be publicly available.