π€ AI Summary
To address the weak scene generalization and the inherent trade-off between content alignment and geometric structure preservation in unsupervised deep image stitching, this paper proposes RopStitch. Methodologically, it introduces a dual-branch network that jointly leverages semantic invariance and fine-grained features, integrating pretrained feature extraction with learnable correlation layers. It further pioneers the concept of a βvirtual optimal plane,β modeling geometric consistency via homography decomposition and bidirectional planar mapping. Finally, an iterative coefficient predictor enables end-to-end optimization. Evaluated on multiple real-world datasets, RopStitch significantly improves stitching robustness and visual naturalness. Notably, it demonstrates superior generalization to unseen scenes compared to state-of-the-art methods, validating its effectiveness in challenging, unconstrained environments.
π Abstract
We present extit{RopStitch}, an unsupervised deep image stitching framework with both robustness and naturalness. To ensure the robustness of extit{RopStitch}, we propose to incorporate the universal prior of content perception into the image stitching model by a dual-branch architecture. It separately captures coarse and fine features and integrates them to achieve highly generalizable performance across diverse unseen real-world scenes. Concretely, the dual-branch model consists of a pretrained branch to capture semantically invariant representations and a learnable branch to extract fine-grained discriminative features, which are then merged into a whole by a controllable factor at the correlation level. Besides, considering that content alignment and structural preservation are often contradictory to each other, we propose a concept of virtual optimal planes to relieve this conflict. To this end, we model this problem as a process of estimating homography decomposition coefficients, and design an iterative coefficient predictor and minimal semantic distortion constraint to identify the optimal plane. This scheme is finally incorporated into extit{RopStitch} by warping both views onto the optimal plane bidirectionally. Extensive experiments across various datasets demonstrate that extit{RopStitch} significantly outperforms existing methods, particularly in scene robustness and content naturalness. The code is available at {color{red}https://github.com/MmelodYy/RopStitch}.