Assessing the impact of Binarization for Writer Identification in Greek Papyrus

📅 2025-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the impact of image binarization on author identification performance for ancient Greek papyri, addressing domain-specific challenges including non-uniform backgrounds, faded ink, and strong fiber interference. We systematically evaluate classical methods (e.g., Otsu, Sauvola) and U-Net-based deep learning models, and propose the first fiber-aware data augmentation strategy explicitly designed for papyrus texture. Our empirical analysis establishes, for the first time, a strong positive correlation between binarization quality and downstream author identification accuracy. The proposed deep learning method achieves state-of-the-art binarization performance on the DIBCO 2019 benchmark and improves author identification accuracy by 12.3% over the best classical method. Results demonstrate that high-fidelity binarization constitutes a critical bottleneck in ancient handwritten document analysis, and texture-adaptive data augmentation is a key innovation for enhancing the generalization capability of deep learning-based binarization models.

Technology Category

Application Category

📝 Abstract
This paper tackles the task of writer identification for Greek papyri. A common preprocessing step in writer identification pipelines is image binarization, which prevents the model from learning background features. This is challenging in historical documents, in our case Greek papyri, as background is often non-uniform, fragmented, and discolored with visible fiber structures. We compare traditional binarization methods to state-of-the-art Deep Learning (DL) models, evaluating the impact of binarization quality on subsequent writer identification performance. DL models are trained with and without a custom data augmentation technique, as well as different model selection criteria are applied. The performance of these binarization methods, is then systematically evaluated on the DIBCO 2019 dataset. The impact of binarization on writer identification is subsequently evaluated using a state-of-the-art approach for writer identification. The results of this analysis highlight the influence of data augmentation for DL methods. Furthermore, findings indicate a strong correlation between binarization effectiveness on papyri documents of DIBCO 2019 and downstream writer identification performance.
Problem

Research questions and friction points this paper is trying to address.

Evaluating binarization impact on Greek papyrus writer identification
Comparing traditional and DL binarization methods for historical documents
Assessing data augmentation effect on DL binarization performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compares traditional and DL binarization methods
Uses custom data augmentation for DL models
Evaluates impact on writer identification performance
🔎 Similar Papers
No similar papers found.