🤖 AI Summary
This paper introduces the first task of region correspondence prediction for unlabeled comic line art images, aiming to establish semantically consistent and topologically coherent inter-image region mappings without requiring pre-segmentation or manual annotations. Methodologically, it partitions line drawings into local patches and employs a Transformer-based architecture to model cross-image patch-level similarity. To elevate patch matches into structurally preserved, contiguous regions, it proposes an edge-aware clustering strategy coupled with a topology-preserving region matching algorithm. To support this research, the authors design an automated annotation pipeline and release a high-quality benchmark dataset. Experiments demonstrate a patch-matching accuracy of 96.34%; moreover, the inferred region correspondences exhibit strong robustness and practical utility in downstream applications such as automatic coloring and intermediate-frame generation.
📝 Abstract
Understanding region-wise correspondence between manga line art images is a fundamental task in manga processing, enabling downstream applications such as automatic line art colorization and in-between frame generation. However, this task remains largely unexplored, especially in realistic scenarios without pre-existing segmentation or annotations. In this paper, we introduce a novel and practical task: predicting region-wise correspondence between raw manga line art images without any pre-existing labels or masks. To tackle this problem, we divide each line art image into a set of patches and propose a Transformer-based framework that learns patch-level similarities within and across images. We then apply edge-aware clustering and a region matching algorithm to convert patch-level predictions into coherent region-level correspondences. To support training and evaluation, we develop an automatic annotation pipeline and manually refine a subset of the data to construct benchmark datasets. Experiments on multiple datasets demonstrate that our method achieves high patch-level accuracy (e.g., 96.34%) and generates consistent region-level correspondences, highlighting its potential for real-world manga applications.