Evaluating the Impact of Khmer Font Types on Text Recognition

📅 2025-06-30

📈 Citations: 0

✨ Influential: 0

career value

135K/year

🤖 AI Summary

This study addresses the challenge of optical character recognition (OCR) for complex scripts by systematically evaluating the impact of font type on Khmer-language OCR performance. Using the Pytesseract framework, we conducted standardized OCR benchmarking across 19 widely used Khmer fonts on realistic text samples—constituting the first quantitative, multi-font evaluation in authentic Khmer document contexts. Results demonstrate that font design critically influences recognition accuracy: Khmer and Odor MeanChey achieve top performance (mean accuracy >92%), whereas iSeth First and Bayon yield substantially lower accuracy (<75%). The analysis reveals systematic correlations between typographic features—such as stroke continuity, glyph distinctiveness, and inter-character spacing—and OCR robustness. This work provides empirically grounded guidance for font selection in Khmer digital archiving and contributes a methodological framework for font-aware OCR optimization in complex-script languages.

Technology Category

Application Category

📝 Abstract

Text recognition is significantly influenced by font types, especially for complex scripts like Khmer. The variety of Khmer fonts, each with its unique character structure, presents challenges for optical character recognition (OCR) systems. In this study, we evaluate the impact of 19 randomly selected Khmer font types on text recognition accuracy using Pytesseract. The fonts include Angkor, Battambang, Bayon, Bokor, Chenla, Dangrek, Freehand, Kh Kompong Chhnang, Kh SN Kampongsom, Khmer, Khmer CN Stueng Songke, Khmer Savuth Pen, Metal, Moul, Odor MeanChey, Preah Vihear, Siemreap, Sithi Manuss, and iSeth First. Our comparison of OCR performance across these fonts reveals that Khmer, Odor MeanChey, Siemreap, Sithi Manuss, and Battambang achieve high accuracy, while iSeth First, Bayon, and Dangrek perform poorly. This study underscores the critical importance of font selection in optimizing Khmer text recognition and provides valuable insights for developing more robust OCR systems.

Problem

Research questions and friction points this paper is trying to address.

Evaluating Khmer font impact on OCR accuracy

Comparing 19 Khmer fonts for text recognition

Identifying best and worst fonts for OCR

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates Khmer font impact on OCR

Uses Pytesseract for text recognition

Identifies high and low accuracy fonts

🔎 Similar Papers

No similar papers found.