🤖 AI Summary
To address low OCR accuracy on low-resolution document scan images, this paper proposes a task-driven single-image super-resolution (SR) method that explicitly incorporates text detection priors into the SR model training. The core contribution is a novel OCR-oriented multi-task loss function that jointly optimizes text structural fidelity—enforcing edge preservation and character contour integrity—and perceptual image quality—measured via pixel-level and feature-level similarity. By aligning SR reconstruction with downstream OCR requirements, the method mitigates the ill-posedness of conventional SR, which often degrades recognition robustness. Experiments on real-world document images demonstrate significant OCR accuracy improvements, especially under severe resolution constraints. Results validate that task-aware super-resolution enhances both effectiveness and practicality for intelligent document analysis.
📝 Abstract
Super-resolution reconstruction is aimed at generating images of high spatial resolution from low-resolution observations. State-of-the-art super-resolution techniques underpinned with deep learning allow for obtaining results of outstanding visual quality, but it is seldom verified whether they constitute a valuable source for specific computer vision applications. In this paper, we investigate the possibility of employing super-resolution as a preprocessing step to improve optical character recognition from document scans. To achieve that, we propose to train deep networks for single-image super-resolution in a task-driven way to make them better adapted for the purpose of text detection. As problems limited to a specific task are heavily ill-posed, we introduce a multi-task loss function that embraces components related with text detection coupled with those guided by image similarity. The obtained results reported in this paper are encouraging and they constitute an important step towards real-world super-resolution of document images.