Towards Khmer Scene Document Layout Detection

📅 2026-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of layout analysis in Khmer scene documents, where scarce annotated data, complex backgrounds, and perspective distortions hinder the performance of existing Latin-based models in accurately detecting dense text regions. To tackle this issue, we present the first systematic investigation of Khmer document layout analysis, introducing the first publicly available dataset for Khmer scene document layout detection. We propose a YOLO-based baseline model that integrates synthetic scene image augmentation with oriented bounding boxes (OBBs) to effectively handle geometric distortions and the unique structural characteristics of Khmer script. The project releases the dataset, code, and trained models openly, significantly advancing layout recognition performance for Khmer documents and offering a comprehensive solution for document analysis in low-resource languages.

Technology Category

Application Category

📝 Abstract
While document layout analysis for Latin scripts has advanced significantly, driven by the advent of large multimodal models (LMMs), progress for the Khmer language remains constrained because of the scarcity of annotated training data. This gap is particularly acute for scene documents, where perspective distortions and complex backgrounds challenge traditional methods. Given the structural complexities of Khmer script, such as diacritics and multi-layer character stacking, existing Latin-based layout analysis models fail to accurately delineate semantic layout units, particularly for dense text regions (e.g., list items). In this paper, we present the first comprehensive study on Khmer scene document layout detection. We contribute a novel framework comprising three key elements: (1) a robust training and benchmarking dataset specifically for Khmer scene layouts; (2) an open-source document augmentation tool capable of synthesizing realistic scene documents to scale training data; and (3) layout detection baselines utilizing YOLO-based architectures with oriented bounding boxes (OBB) to handle geometric distortions. To foster further research in the Khmer document analysis and recognition (DAR) community, we release our models, code, and datasets in this gated repository (in review).
Problem

Research questions and friction points this paper is trying to address.

Khmer script
scene document layout detection
document layout analysis
annotated training data scarcity
geometric distortions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Khmer document layout
scene document synthesis
oriented bounding box
data augmentation
YOLO-based layout detection
🔎 Similar Papers
No similar papers found.
M
Marry Kong
Techo Startup Center, Ministry of Economy and Finance, Cambodia
R
Rina Buoy
Techo Startup Center, Ministry of Economy and Finance, Cambodia
S
Sovisal Chenda
Techo Startup Center, Ministry of Economy and Finance, Cambodia
N
Nguonly Taing
Techo Startup Center, Ministry of Economy and Finance, Cambodia
Masakazu Iwamura
Masakazu Iwamura
Osaka Metropolitan University
Koichi Kise
Koichi Kise
Professor of Graduate School of Informatics, Osaka Metropolitan University
Document Image AnalysisComputer VisionHuman Sensing and Actuation