Evaluating the Impact of Khmer Font Types on Text Recognition

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of optical character recognition (OCR) for complex scripts by systematically evaluating the impact of font type on Khmer-language OCR performance. Using the Pytesseract framework, we conducted standardized OCR benchmarking across 19 widely used Khmer fonts on realistic text samples—constituting the first quantitative, multi-font evaluation in authentic Khmer document contexts. Results demonstrate that font design critically influences recognition accuracy: Khmer and Odor MeanChey achieve top performance (mean accuracy >92%), whereas iSeth First and Bayon yield substantially lower accuracy (<75%). The analysis reveals systematic correlations between typographic features—such as stroke continuity, glyph distinctiveness, and inter-character spacing—and OCR robustness. This work provides empirically grounded guidance for font selection in Khmer digital archiving and contributes a methodological framework for font-aware OCR optimization in complex-script languages.

Technology Category

Application Category

📝 Abstract
Text recognition is significantly influenced by font types, especially for complex scripts like Khmer. The variety of Khmer fonts, each with its unique character structure, presents challenges for optical character recognition (OCR) systems. In this study, we evaluate the impact of 19 randomly selected Khmer font types on text recognition accuracy using Pytesseract. The fonts include Angkor, Battambang, Bayon, Bokor, Chenla, Dangrek, Freehand, Kh Kompong Chhnang, Kh SN Kampongsom, Khmer, Khmer CN Stueng Songke, Khmer Savuth Pen, Metal, Moul, Odor MeanChey, Preah Vihear, Siemreap, Sithi Manuss, and iSeth First. Our comparison of OCR performance across these fonts reveals that Khmer, Odor MeanChey, Siemreap, Sithi Manuss, and Battambang achieve high accuracy, while iSeth First, Bayon, and Dangrek perform poorly. This study underscores the critical importance of font selection in optimizing Khmer text recognition and provides valuable insights for developing more robust OCR systems.
Problem

Research questions and friction points this paper is trying to address.

Evaluating Khmer font impact on OCR accuracy
Comparing 19 Khmer fonts for text recognition
Identifying best and worst fonts for OCR
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates Khmer font impact on OCR
Uses Pytesseract for text recognition
Identifies high and low accuracy fonts
🔎 Similar Papers
No similar papers found.
V
Vannkinh Nom
La Rochelle University, Laboratoire Informatique Image Interaction (L3i)
Souhail Bakkali
Souhail Bakkali
L3i, La Rochelle Université
Computer VisionPattern RecognitionDocument Analysis and Understanding
M
Muhammad Muzzamil Luqman
La Rochelle University, Laboratoire Informatique Image Interaction (L3i)
M
Mickael Coustaty
La Rochelle University, Laboratoire Informatique Image Interaction (L3i)
Jean-Marc Ogier
Jean-Marc Ogier
Professeur d'informatique, Université de la Rochelle, President (Rector) of the University of la
analyse de documentsreconnaissance des formesindexation par le contenu