🤖 AI Summary
This work addresses the challenge of high-precision pixel-level alignment in multimodal historical panel painting images, which is hindered by resolution discrepancies, large image sizes, non-rigid deformations, and inter-modal content inconsistencies. To overcome these issues, the study proposes a single-stage coarse-to-fine non-rigid registration framework that leverages craquelure—crack patterns common across modalities—as a universal feature. The method jointly detects and describes craquelure keypoints using a convolutional neural network, performs local patch matching via a graph neural network, and achieves global non-rigid alignment through thin-plate spline interpolation. Multi-scale keypoint refinement and local homography-based reprojection error filtering further enhance accuracy. Evaluated on a newly curated multimodal panel painting dataset, the approach significantly outperforms existing keypoint- and dense-matching-based methods, with ablation studies confirming the contribution of each component and establishing a new state of the art in registration precision.
📝 Abstract
Art technological investigations of historical panel paintings rely on acquiring multi-modal image data, including visual light photography, infrared reflectography, ultraviolet fluorescence photography, x-radiography, and macro photography. For a comprehensive analysis, the multi-modal images require pixel-wise alignment, which is still often performed manually. Multi-modal image registration can reduce this laborious manual work, is substantially faster, and enables higher precision. Due to varying image resolutions, huge image sizes, non-rigid distortions, and modality-dependent image content, registration is challenging. Therefore, we propose a coarse-to-fine non-rigid multi-modal registration method efficiently relying on sparse keypoints and thin-plate-splines. Historical paintings exhibit a fine crack pattern, called craquelure, on the paint layer, which is captured by all image systems and is well-suited as a feature for registration. In our one-stage non-rigid registration approach, we employ a convolutional neural network for joint keypoint detection and description based on the craquelure and a graph neural network for descriptor matching in a patch-based manner, and filter matches based on homography reprojection errors in local areas. For coarse-to-fine registration, we introduce a novel multi-level keypoint refinement approach to register mixed-resolution images up to the highest resolution. We created a multi-modal dataset of panel paintings with a high number of keypoint annotations, and a large test set comprising five multi-modal domains and varying image resolutions. The ablation study demonstrates the effectiveness of all modules of our refinement method. Our proposed approaches achieve the best registration results compared to competing keypoint and dense matching methods and refinement methods.