QARI-OCR: High-Fidelity Arabic Text Recognition through Multimodal Large Language Model Adaptation

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Arabic OCR faces inherent challenges including cursive script, diacritical marks (tashkeel), font and layout variability. To address these, we introduce QARI—a series of open-source, Arabic-specific multimodal OCR models built upon Qwen2-VL-2B-Instruct. Our method pioneers an iterative, synthetic-data-driven fine-tuning paradigm integrating vision–language joint alignment, Arabic-customized tokenization, and rule-aware post-processing. QARI v0.2 achieves state-of-the-art performance on standard benchmarks: WER = 0.160, CER = 0.061, and BLEU = 0.737—demonstrating substantial improvements in tashkeel recognition accuracy and robustness to low-resolution inputs. Moreover, the model supports document structure understanding and handwritten Arabic text recognition. All models, training code, and the synthetic dataset are fully open-sourced to foster reproducibility and community advancement.

Technology Category

Application Category

📝 Abstract

The inherent complexities of Arabic script; its cursive nature, diacritical marks (tashkeel), and varied typography, pose persistent challenges for Optical Character Recognition (OCR). We present Qari-OCR, a series of vision-language models derived from Qwen2-VL-2B-Instruct, progressively optimized for Arabic through iterative fine-tuning on specialized synthetic datasets. Our leading model, QARI v0.2, establishes a new open-source state-of-the-art with a Word Error Rate (WER) of 0.160, Character Error Rate (CER) of 0.061, and BLEU score of 0.737 on diacritically-rich texts. Qari-OCR demonstrates superior handling of tashkeel, diverse fonts, and document layouts, alongside impressive performance on low-resolution images. Further explorations (QARI v0.3) showcase strong potential for structural document understanding and handwritten text. This work delivers a marked improvement in Arabic OCR accuracy and efficiency, with all models and datasets released to foster further research.

Problem

Research questions and friction points this paper is trying to address.

Overcoming Arabic script complexities for accurate OCR

Enhancing diacritical mark recognition in Arabic texts

Improving OCR performance on low-resolution Arabic documents

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Large Language Model Adaptation

Iterative fine-tuning on synthetic datasets

Superior handling of tashkeel and fonts

🔎 Similar Papers

No similar papers found.

Authors to Follow