MFE-GAN: Efficient GAN-based Framework for Document Image Enhancement and Binarization with Multi-scale Feature Extraction

📅 2025-12-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low training and inference efficiency caused by multi-channel independent GANs in degraded color document image enhancement and binarization, this paper proposes an efficient, lightweight end-to-end GAN framework. Our method introduces: (1) a multi-scale feature extraction module integrating Haar wavelet transform and normalization to explicitly model cross-scale texture and structural information; (2) parameter-efficient generator and discriminator architectures; and (3) a joint adversarial loss coupled with an adaptive reconstruction loss. Experiments on Benchmark, Nabuco, and CMATERdb datasets demonstrate that our approach significantly reduces total training and inference time while achieving competitive performance—comparable to state-of-the-art methods—in terms of PSNR, F-Score, and other quantitative metrics. The framework thus achieves an effective balance between computational efficiency and restoration accuracy.

Technology Category

Application Category

📝 Abstract
Document image enhancement and binarization are commonly performed prior to document analysis and recognition tasks for improving the efficiency and accuracy of optical character recognition (OCR) systems. This is because directly recognizing text in degraded documents, particularly in color images, often results in unsatisfactory recognition performance. To address these issues, existing methods train independent generative adversarial networks (GANs) for different color channels to remove shadows and noise, which, in turn, facilitates efficient text information extraction. However, deploying multiple GANs results in long training and inference times. To reduce both training and inference times of document image enhancement and binarization models, we propose MFE-GAN, an efficient GAN-based framework with multi-scale feature extraction (MFE), which incorporates Haar wavelet transformation (HWT) and normalization to process document images before feeding them into GANs for training. In addition, we present novel generators, discriminators, and loss functions to improve the model's performance, and we conduct ablation studies to demonstrate their effectiveness. Experimental results on the Benchmark, Nabuco, and CMATERdb datasets demonstrate that the proposed MFE-GAN significantly reduces the total training and inference times while maintaining comparable performance with respect to state-of-the-art (SOTA) methods. The implementation of this work is available at https://ruiyangju.github.io/MFE-GAN.
Problem

Research questions and friction points this paper is trying to address.

Enhances degraded document images for OCR
Reduces training and inference time of GANs
Improves binarization with multi-scale feature extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

MFE-GAN framework with multi-scale feature extraction
Haar wavelet transformation for preprocessing document images
Novel generators, discriminators, and loss functions