Understanding Cross-Language Transfer Improvements in Low-Resource HTR: The Role of Sequence Modeling

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

177K/year
🤖 AI Summary
This study investigates the underlying mechanism of cross-lingual transfer in low-resource handwritten text recognition (HTR), specifically disentangling whether its efficacy stems from shared visual representations or sequence modeling capabilities. Through controlled experiments on Arabic, Urdu, and Persian scripts, the authors compare CNN-only architectures against full CRNN models under both monolingual and multilingual training settings. Results demonstrate that, in low-resource regimes with 100–1,000 training samples, CRNNs trained multilingually significantly outperform CNN-only counterparts, with the largest reductions in character error rate (CER) observed under the most data-scarce conditions. The work provides the first clear evidence that sequence modeling—not visual similarity of character shapes—is the critical factor enabling effective cross-lingual transfer in HTR.
📝 Abstract
Handwritten Text Recognition (HTR) for Arabic-script languages benefits from cross-language joint training under low-resource conditions, particularly when using CRNN-based models that combine convolutional encoders with sequence modeling. However, it remains unclear whether these improvements are better explained by shared visual representations or sequence-level dependencies. In this work, we conduct a controlled architectural study of line-level Arabic-script HTR, comparing CNN-only models with CTC decoding and CRNN models under identical single-script and multi-script training regimes. Experiments are performed on Arabic (KHATT), Urdu (NUST-UHWR), and Persian (PHTD) datasets under low-resource settings (K in {100, 500, 1000}). Our results show a clear divergence in transfer behavior: while CNN-only models exhibit limited or unstable improvements, CRNN models achieve better performance under multi-script training, particularly in the most data-constrained regimes. Focusing on transfer improvements (delta CER) rather than absolute performance, we find that cross-language improvements are associated with sequence-level modeling, while sharing visual representations learned by the CNN encoder, corresponding to similarities in character shapes across scripts, alone appears to be insufficient. This finding suggests that contextual modeling plays an important role in enabling effective transfer in low-resource scenarios, and that similar behavior may extend to other low-resource language settings.
Problem

Research questions and friction points this paper is trying to address.

Handwritten Text Recognition
Cross-Language Transfer
Low-Resource
Sequence Modeling
Arabic-script
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-language transfer
sequence modeling
low-resource HTR
CRNN
contextual modeling
🔎 Similar Papers
No similar papers found.
S
Sana Al-azzawi
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Luleå, Sweden
C
Chang Liu
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Luleå, Sweden
N
Nudrat Habib
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Luleå, Sweden
E
Elisa Barney
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Luleå, Sweden
Marcus Liwicki
Marcus Liwicki
Luleå University of Technology, EISLAB, Machine Learning, Sweden
Deep LearningArtificial IntelligenceDocument AnalysisPattern RecognitionApplied AI