Bengali Document Layout Analysis - A YOLOV8 Based Ensembling Approach

📅 2023-09-02
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

185K/year
🤖 AI Summary
Low recognition accuracy in Bengali document layout analysis (DLA) arises from complex script characteristics—including character touching and multi-scale layout structures. Method: We propose a two-stage prediction and model ensemble framework based on YOLOv8. It incorporates customized data augmentation (e.g., font deformation and background synthesis) tailored to the BaDLAD dataset, a geometric-rule-guided connected-component post-processing module to mitigate text block merging and false detections, and a soft-voting ensemble strategy to enhance robustness. Contribution/Results: This work presents the first systematic adaptation and optimization of YOLOv8 for Bengali DLA. Experiments on BaDLAD achieve a layout element F1-score of 92.7%, significantly outperforming single-model baselines. The approach delivers a high-accuracy, production-ready layout parsing solution for Bengali OCR and document understanding.
📝 Abstract
This paper focuses on enhancing Bengali Document Layout Analysis (DLA) using the YOLOv8 model and innovative post-processing techniques. We tackle challenges unique to the complex Bengali script by employing data augmentation for model robustness. After meticulous validation set evaluation, we fine-tune our approach on the complete dataset, leading to a two-stage prediction strategy for accurate element segmentation. Our ensemble model, combined with post-processing, outperforms individual base architectures, addressing issues identified in the BaDLAD dataset. By leveraging this approach, we aim to advance Bengali document analysis, contributing to improved OCR and document comprehension and BaDLAD serves as a foundational resource for this endeavor, aiding future research in the field. Furthermore, our experiments provided key insights to incorporate new strategies into the established solution.
Problem

Research questions and friction points this paper is trying to address.

Enhancing Bengali document layout analysis using YOLOv8
Addressing complex Bengali script segmentation challenges
Improving OCR accuracy through ensemble post-processing techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

YOLOv8 model with post-processing techniques
Data augmentation for complex Bengali script
Two-stage prediction strategy for segmentation
🔎 Similar Papers
No similar papers found.