🤖 AI Summary
Handwritten text recognition suffers from ambiguity arising from diverse writing styles, yet existing methods fail to explicitly model writer-specific stylistic variations, limiting recognition accuracy. To address this, we propose the Writer Style Block (WSB), a writer-conditioned instance normalization layer built upon text-block embeddings, which incorporates writer identity as an explicit input to enable style-adaptive modeling and zero-shot estimation of unseen writer embeddings. Our approach integrates contrastive pretraining, writer-conditioned representation learning, and domain-adaptive fine-tuning. Experiments demonstrate that WSB significantly outperforms style-agnostic baselines in writer-dependent settings and exhibits strong cross-writer generalization; however, it slightly underperforms in writer-independent configurations, suggesting a need for improved embedding regularization and training stability. The core contribution lies in the first integration of learnable writer embeddings with conditional normalization—yielding a lightweight, scalable architecture for style-adaptive recognition.
📝 Abstract
One of the challenges of handwriting recognition is to transcribe a large number of vastly different writing styles. State-of-the-art approaches do not explicitly use information about the writer's style, which may be limiting overall accuracy due to various ambiguities. We explore models with writer-dependent parameters which take the writer's identity as an additional input. The proposed models can be trained on datasets with partitions likely written by a single author (e.g. single letter, diary, or chronicle). We propose a Writer Style Block (WSB), an adaptive instance normalization layer conditioned on learned embeddings of the partitions. We experimented with various placements and settings of WSB and contrastively pre-trained embeddings. We show that our approach outperforms a baseline with no WSB in a writer-dependent scenario and that it is possible to estimate embeddings for new writers. However, domain adaptation using simple fine-tuning in a writer-independent setting provides superior accuracy at a similar computational cost. The proposed approach should be further investigated in terms of training stability and embedding regularization to overcome such a baseline.