N-gram Injection into Transformers for Dynamic Language Model Adaptation in Handwritten Text Recognition

📅 2026-03-04

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

This work addresses the significant performance degradation of existing Transformer-based models in handwritten text recognition when test data exhibit linguistic distribution shifts relative to the training corpus. To mitigate this issue without requiring fine-tuning, the authors propose a novel inference-time approach that dynamically integrates an external n-gram language model into the Transformer decoder via an early feature injection mechanism, thereby adapting to the linguistic characteristics of the target domain. Notably, this method is the first to effectively leverage external language models during inference without necessitating additional image-text paired data. Experimental results across three handwritten text datasets demonstrate that the proposed approach substantially narrows the performance gap between source and target domains, confirming its effectiveness and strong generalization capability under distributional shift.

Technology Category

Application Category

📝 Abstract

Transformer-based encoder-decoder networks have recently achieved impressive results in handwritten text recognition, partly thanks to their auto-regressive decoder which implicitly learns a language model. However, such networks suffer from a large performance drop when evaluated on a target corpus whose language distribution is shifted from the source text seen during training. To retain recognition accuracy despite this language shift, we propose an external n-gram injection (NGI) for dynamic adaptation of the network's language modeling at inference time. Our method allows switching to an n-gram language model estimated on a corpus close to the target distribution, therefore mitigating bias without any extra training on target image-text pairs. We opt for an early injection of the n-gram into the transformer decoder so that the network learns to fully leverage text-only data at the low additional cost of n-gram inference. Experiments on three handwritten datasets demonstrate that the proposed NGI significantly reduces the performance gap between source and target corpora.

Problem

Research questions and friction points this paper is trying to address.

handwritten text recognition

language model adaptation

domain shift

transformer

n-gram

Innovation

Methods, ideas, or system contributions that make the work stand out.

n-gram injection

dynamic language model adaptation

handwritten text recognition