Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

career value

134K/year

🤖 AI Summary

To address the high character/word segmentation errors and insufficient contextual modeling in traditional OCR, this paper proposes a line-level OCR paradigm that bypasses explicit character and word segmentation and performs end-to-end recognition directly on full text lines. Methodologically, we introduce a unified sequence-to-sequence framework integrating object detection with deep language modeling. We provide the first systematic empirical validation of the advantages of line-level modeling and release LineOCR, the first fine-grained annotation dataset specifically designed for line-level training and evaluation (251 pages of English documents). Experiments demonstrate that our approach achieves a 5.4% absolute improvement in end-to-end accuracy and a 4× speedup in inference latency, substantially alleviating bottlenecks inherent in conventional “segment-then-recognize” pipelines. This work advances OCR toward a unified perception-and-understanding paradigm.

Technology Category

Application Category

📝 Abstract

Conventional optical character recognition (OCR) techniques segmented each character and then recognized. This made them prone to error in character segmentation, and devoid of context to exploit language models. Advances in sequence to sequence translation in last decade led to modern techniques first detecting words and then inputting one word at a time to a model to directly output full words as sequence of characters. This allowed better utilization of language models and bypass error-prone character segmentation step. We observe that the above transition in style has moved the bottleneck in accuracy to word segmentation. Hence, in this paper, we propose a natural and logical progression from word level OCR to line-level OCR. The proposal allows to bypass errors in word detection, and provides larger sentence context for better utilization of language models. We show that the proposed technique not only improves the accuracy but also efficiency of OCR. Despite our thorough literature survey, we did not find any public dataset to train and benchmark such shift from word to line-level OCR. Hence, we also contribute a meticulously curated dataset of 251 English page images with line-level annotations. Our experimentation revealed a notable end-to-end accuracy improvement of 5.4%, underscoring the potential benefits of transitioning towards line-level OCR, especially for document images. We also report a 4 times improvement in efficiency compared to word-based pipelines. With continuous improvements in large language models, our methodology also holds potential to exploit such advances. Project Website: https://nishitanand.github.io/line-level-ocr-website

Problem

Research questions and friction points this paper is trying to address.

Addresses word segmentation errors in OCR systems

Proposes line-level OCR to bypass word detection issues

Enhances language model context and improves accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Line-level OCR bypasses word segmentation errors

Leverages larger context for better language model usage

Improves both accuracy and efficiency over word-based methods

🔎 Similar Papers

Chronicling Germany: An Annotated Historical Newspaper Dataset