Optimal Transport for Handwritten Text Recognition in a Low-Resource Regime

📅 2025-09-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In low-resource handwritten text recognition (HTR), performance is severely constrained by the scarcity of labeled data. To address this, we propose an iterative self-bootstrapping framework that integrates lexical priors. Our key innovation is the first application of optimal transport to align visual features with semantic word embeddings across modalities, enabling interpretable, high-confidence pseudo-label generation for unlabeled images—without requiring ground-truth annotations—and driving iterative model refinement. The method synergistically combines deep visual representations, pretrained word embeddings, and few-shot self-supervised learning. Evaluated on multiple low-resource HTR benchmarks, our approach achieves significant improvements over state-of-the-art few-shot and semi-supervised methods using only 1–5 labeled samples per class, demonstrating a substantial reduction in reliance on large-scale annotated datasets.

Technology Category

Application Category

📝 Abstract
Handwritten Text Recognition (HTR) is a task of central importance in the field of document image understanding. State-of-the-art methods for HTR require the use of extensive annotated sets for training, making them impractical for low-resource domains like historical archives or limited-size modern collections. This paper introduces a novel framework that, unlike the standard HTR model paradigm, can leverage mild prior knowledge of lexical characteristics; this is ideal for scenarios where labeled data are scarce. We propose an iterative bootstrapping approach that aligns visual features extracted from unlabeled images with semantic word representations using Optimal Transport (OT). Starting with a minimal set of labeled examples, the framework iteratively matches word images to text labels, generates pseudo-labels for high-confidence alignments, and retrains the recognizer on the growing dataset. Numerical experiments demonstrate that our iterative visual-semantic alignment scheme significantly improves recognition accuracy on low-resource HTR benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Addresses handwritten text recognition with limited annotated training data
Leverages lexical knowledge and optimal transport for visual-semantic alignment
Proposes iterative bootstrapping to improve accuracy in low-resource scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal Transport aligns visual features with word semantics
Iterative bootstrapping generates pseudo-labels from high-confidence alignments
Framework leverages lexical knowledge for low-resource handwritten recognition