🤖 AI Summary
Document shadows severely degrade digitization quality; existing methods often rely on shadow masks or strong priors, resulting in poor generalizability and low content fidelity. This paper proposes a mask-free, end-to-end shadow removal framework that, for the first time, leverages intrinsic image contrast as the core guiding signal to enable progressive reconstruction—from coarse-grained localization to fine-grained restoration. The method comprises a contrast-driven two-stage network, a contrast-aware feature fusion mechanism, and a differentiable shadow modeling and reconstruction module. Evaluated on multiple benchmark datasets, it achieves state-of-the-art performance: significantly higher PSNR and SSIM scores than prior works. Qualitative results demonstrate more complete shadow removal, with superior preservation of text structures, edge details, and color fidelity.
📝 Abstract
Document shadows are a major obstacle in the digitization process. Due to the dense information in text and patterns covered by shadows, document shadow removal requires specialized methods. Existing document shadow removal methods, although showing some progress, still rely on additional information such as shadow masks or lack generalization and effectiveness across different shadow scenarios. This often results in incomplete shadow removal or loss of original document content and tones. Moreover, these methods tend to underutilize the information present in the original shadowed document image. In this paper, we refocus our approach on the document images themselves, which inherently contain rich information.We propose an end-to-end document shadow removal method guided by contrast representation, following a coarse-to-fine refinement approach. By extracting document contrast information, we can effectively and quickly locate shadow shapes and positions without the need for additional masks. This information is then integrated into the refined shadow removal process, providing better guidance for network-based removal and feature fusion. Extensive qualitative and quantitative experiments show that our method achieves state-of-the-art performance.