Training Kindai OCR with parallel textline images and self-attention feature distance-based loss

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

To address the performance bottleneck of OCR for Meiji–Taishō-era modern Japanese (Kindai) texts caused by scarce annotated data, this paper proposes a domain adaptation method leveraging parallel text-image pairs and self-attention feature distance constraints. We construct a parallel dataset comprising historical text-line images and their modern-font counterparts, and design a self-attention feature distance loss that jointly incorporates Euclidean distance and Maximum Mean Discrepancy (MMD). Integrated into a Transformer-based OCR model, this loss enforces cross-font representation alignment without requiring manual annotations on historical documents. The approach significantly enhances robustness against glyph variation and layout degradation. Experiments demonstrate character error rate reductions of 2.23% (with Euclidean distance) and 3.94% (with MMD) over the baseline, establishing a scalable solution for low-resource historical document digitization.

Technology Category

Application Category

📝 Abstract

Kindai documents, written in modern Japanese from the late 19th to early 20th century, hold significant historical value for researchers studying societal structures, daily life, and environmental conditions of that period. However, transcribing these documents remains a labor-intensive and time-consuming task, resulting in limited annotated data for training optical character recognition (OCR) systems. This research addresses this challenge of data scarcity by leveraging parallel textline images - pairs of original Kindai text and their counterparts in contemporary Japanese fonts - to augment training datasets. We introduce a distance-based objective function that minimizes the gap between self-attention features of the parallel image pairs. Specifically, we explore Euclidean distance and Maximum Mean Discrepancy (MMD) as domain adaptation metrics. Experimental results demonstrate that our method reduces the character error rate (CER) by 2.23% and 3.94% over a Transformer-based OCR baseline when using Euclidean distance and MMD, respectively. Furthermore, our approach improves the discriminative quality of self-attention representations, leading to more effective OCR performance for historical documents.

Problem

Research questions and friction points this paper is trying to address.

Enhancing OCR for historical Kindai Japanese documents

Reducing data scarcity with parallel textline images

Improving self-attention feature matching for domain adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel textline images augment training datasets

Distance-based loss minimizes self-attention feature gaps

Euclidean and MMD metrics enhance OCR accuracy

🔎 Similar Papers

Out of Length Text Recognition with Sub-String Matching