🤖 AI Summary
This survey addresses the robustness challenges in Handwritten Text Recognition (HTR) arising from high variability in handwriting styles, document layouts, and image quality. It systematically traces the evolution from heuristic approaches to end-to-end, document-level deep learning models. We propose the first unified analytical framework for HTR—comprehensively covering methodologies, benchmark evaluations, mainstream datasets, and performance comparisons—while explicitly defining recognition granularities: line-level and supra-line-level (i.e., paragraph- or document-level). The framework integrates CNNs, RNNs, Transformers, and attention mechanisms, synergizing sequence modeling strategies such as Connectionist Temporal Classification (CTC) and attention-based Seq2Seq to support multi-granularity recognition. Synthesizing over 100 works, we clarify the technical trajectory and identify core open challenges: cross-domain generalization, model interpretability, and system-level robustness. This work establishes a foundational theoretical and practical reference for developing real-world document understanding systems.
📝 Abstract
Handwritten Text Recognition (HTR) has become an essential field within pattern recognition and machine learning, with applications spanning historical document preservation to modern data entry and accessibility solutions. The complexity of HTR lies in the high variability of handwriting, which makes it challenging to develop robust recognition systems. This survey examines the evolution of HTR models, tracing their progression from early heuristic-based approaches to contemporary state-of-the-art neural models, which leverage deep learning techniques. The scope of the field has also expanded, with models initially capable of recognizing only word-level content progressing to recent end-to-end document-level approaches. Our paper categorizes existing work into two primary levels of recognition: (1) emph{up to line-level}, encompassing word and line recognition, and (2) emph{beyond line-level}, addressing paragraph- and document-level challenges. We provide a unified framework that examines research methodologies, recent advances in benchmarking, key datasets in the field, and a discussion of the results reported in the literature. Finally, we identify pressing research challenges and outline promising future directions, aiming to equip researchers and practitioners with a roadmap for advancing the field.