Invizo: Arabic Handwritten Document Optical Character Recognition Solution

📅 2025-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the end-to-end OCR challenge for Arabic handwritten and printed text—including numerals—by proposing a robust document parsing framework integrating CNNs and Transformers. Methodologically, it employs multi-scale text detection and adaptive binarization for improved localization robustness; pioneers the fusion of CNN-based feature extraction with Transformer-based sequential modeling to capture handwriting variability; and introduces a joint CTC-Attention decoding mechanism to uniformly handle mixed handwritten, printed, and numeric text. Experiments yield a text detection F-measure of 79.07%, a character error rate (CER) of 0.59% on printed text and 7.91% on handwritten text. The system demonstrates strong generalization on real-world receipts and document images, meeting industrial deployment requirements. Its core contribution is the first end-to-end Arabic OCR system explicitly designed to accommodate the high diversity of handwritten forms while ensuring structured, reliable output.

Technology Category

Application Category

📝 Abstract
Converting images of Arabic text into plain text is a widely researched topic in academia and industry. However, recognition of Arabic handwritten and printed text presents difficult challenges due to the complex nature of variations of the Arabic script. This work proposes an end-to-end solution for recognizing Arabic handwritten, printed, and Arabic numbers and presents the data in a structured manner. We reached 81.66% precision, 78.82% Recall, and 79.07% F-measure on a Text Detection task that powers the proposed solution. The proposed recognition model incorporates state-of-the-art CNN-based feature extraction, and Transformer-based sequence modeling to accommodate variations in handwriting styles, stroke thicknesses, alignments, and noise conditions. The evaluation of the model suggests its strong performances on both printed and handwritten texts, yielding 0.59% CER and&1.72% WER on printed text, and 7.91% CER and 31.41% WER on handwritten text. The overall proposed solution has proven to be relied on in real-life OCR tasks. Equipped with both detection and recognition models as well as other Feature Extraction and Matching helping algorithms. With the general purpose implementation, making the solution valid for any given document or receipt that is Arabic handwritten or printed. Thus, it is practical and useful for any given context.
Problem

Research questions and friction points this paper is trying to address.

Arabic handwritten OCR solution
Complex script variations handling
High accuracy text recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

CNN-based feature extraction
Transformer-based sequence modeling
End-to-end Arabic OCR solution
🔎 Similar Papers
No similar papers found.
Alhossien Waly
Alhossien Waly
Egypt-Japan University of Science and technology (E-JUST)
Artificial IntelligenceComputer Vision
B
Bassant Tarek
Department of Computer Science and Engineering, Egypt-Japan University of Science and Technology, Alexandria, Egypt
A
Ali Feteha
Department of Computer Science and Engineering, Egypt-Japan University of Science and Technology, Alexandria, Egypt
R
Rewan Yehia
Department of Computer Science and Engineering, Egypt-Japan University of Science and Technology, Alexandria, Egypt
G
Gasser Amr
Department of Computer Science and Engineering, Egypt-Japan University of Science and Technology, Alexandria, Egypt
Walid Gomaa
Walid Gomaa
Egypt Japan University of Science and Technology
Artificial Intelligence and Theoretical Computer Science
Ahmed Fares
Ahmed Fares
Assoc. Prof. of Computer Sci. and Eng., E-JUST
NeuroinformaticsBioinformaticsNeuroscienceMachine learningDeep learning