MatteViT: High-Frequency-Aware Document Shadow Removal with Shadow Matte Guidance

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
Document shadow removal is critical for enhancing digital document quality and OCR performance, yet existing methods often degrade high-frequency details such as text edges. To address this, we propose a vision Transformer-based shadow removal framework. Our method introduces a High-Frequency Enhancement Module (HFAM) to strengthen frequency-domain detail representation and pioneers a continuous brightness-driven shadow mask guidance mechanism—enabling, for the first time within a Transformer architecture, synergistic integration of frequency-domain enhancement and spatially precise localization. Coupled with multi-stage feature fusion and a shadow generator trained on a self-constructed mask dataset, our approach significantly improves fine-grained structural recovery. Evaluated on the RDD and Kligler benchmarks, it achieves state-of-the-art performance, with substantial gains in OCR accuracy—demonstrating both robustness and practical applicability.

Technology Category

Application Category

📝 Abstract
Document shadow removal is essential for enhancing the clarity of digitized documents. Preserving high-frequency details (e.g., text edges and lines) is critical in this process because shadows often obscure or distort fine structures. This paper proposes a matte vision transformer (MatteViT), a novel shadow removal framework that applies spatial and frequency-domain information to eliminate shadows while preserving fine-grained structural details. To effectively retain these details, we employ two preservation strategies. First, our method introduces a lightweight high-frequency amplification module (HFAM) that decomposes and adaptively amplifies high-frequency components. Second, we present a continuous luminance-based shadow matte, generated using a custom-built matte dataset and shadow matte generator, which provides precise spatial guidance from the earliest processing stage. These strategies enable the model to accurately identify fine-grained regions and restore them with high fidelity. Extensive experiments on public benchmarks (RDD and Kligler) demonstrate that MatteViT achieves state-of-the-art performance, providing a robust and practical solution for real-world document shadow removal. Furthermore, the proposed method better preserves text-level details in downstream tasks, such as optical character recognition, improving recognition performance over prior methods.
Problem

Research questions and friction points this paper is trying to address.

Removes shadows from digitized documents to enhance clarity
Preserves high-frequency details like text edges and lines
Improves optical character recognition performance in downstream tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

MatteViT uses spatial and frequency-domain transformer for shadow removal
HFAM module amplifies high-frequency components to preserve details
Shadow matte provides spatial guidance from early processing stage