π€ AI Summary
To address the performance bottleneck of OCR for MeijiβTaishΕ-era modern Japanese (Kindai) texts caused by scarce annotated data, this paper proposes a domain adaptation method leveraging parallel text-image pairs and self-attention feature distance constraints. We construct a parallel dataset comprising historical text-line images and their modern-font counterparts, and design a self-attention feature distance loss that jointly incorporates Euclidean distance and Maximum Mean Discrepancy (MMD). Integrated into a Transformer-based OCR model, this loss enforces cross-font representation alignment without requiring manual annotations on historical documents. The approach significantly enhances robustness against glyph variation and layout degradation. Experiments demonstrate character error rate reductions of 2.23% (with Euclidean distance) and 3.94% (with MMD) over the baseline, establishing a scalable solution for low-resource historical document digitization.
π Abstract
Kindai documents, written in modern Japanese from the late 19th to early 20th century, hold significant historical value for researchers studying societal structures, daily life, and environmental conditions of that period. However, transcribing these documents remains a labor-intensive and time-consuming task, resulting in limited annotated data for training optical character recognition (OCR) systems. This research addresses this challenge of data scarcity by leveraging parallel textline images - pairs of original Kindai text and their counterparts in contemporary Japanese fonts - to augment training datasets. We introduce a distance-based objective function that minimizes the gap between self-attention features of the parallel image pairs. Specifically, we explore Euclidean distance and Maximum Mean Discrepancy (MMD) as domain adaptation metrics. Experimental results demonstrate that our method reduces the character error rate (CER) by 2.23% and 3.94% over a Transformer-based OCR baseline when using Euclidean distance and MMD, respectively. Furthermore, our approach improves the discriminative quality of self-attention representations, leading to more effective OCR performance for historical documents.