ShadowMaskFormer: Mask Augmented Patch Embeddings for Shadow Removal

📅 2024-04-29
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Existing Transformer-based shadow removal methods often incorporate shadow priors through complex modifications to the attention mechanism, resulting in bloated architectures and high computational overhead. This paper proposes a lightweight shadow-aware Vision Transformer (ViT) framework. Its core innovation lies in explicitly embedding the shadow mask into the patch embedding layer—specifically at the *front end*, rather than within the attention modules—enabling efficient integration of shadow priors at the very earliest stage of feature extraction. This design avoids redundant structural alterations to self-attention, relying solely on standard multi-head self-attention and supervised mask guidance. Evaluated on ISTD, ISTD+, and SRD benchmarks, our method surpasses state-of-the-art approaches in both reconstruction accuracy—especially within shadow regions—and inference efficiency, while using significantly fewer parameters. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
Transformer recently emerged as the de facto model for computer vision tasks and has also been successfully applied to shadow removal. However, these existing methods heavily rely on intricate modifications to the attention mechanisms within the transformer blocks while using a generic patch embedding. As a result, it often leads to complex architectural designs requiring additional computation resources. In this work, we aim to explore the efficacy of incorporating shadow information within the early processing stage. Accordingly, we propose a transformer-based framework with a novel patch embedding that is tailored for shadow removal, dubbed ShadowMaskFormer. Specifically, we present a simple and effective mask-augmented patch embedding to integrate shadow information and promote the model's emphasis on acquiring knowledge for shadow regions. Extensive experiments conducted on the ISTD, ISTD+, and SRD benchmark datasets demonstrate the efficacy of our method against state-of-the-art approaches while using fewer model parameters.g fewer model parameters. Our implementation is available at https://github.com/lizhh268/ShadowMaskFormer.
Problem

Research questions and friction points this paper is trying to address.

Improves shadow removal using transformer-based models
Simplifies architecture by enhancing early patch embeddings
Reduces computational resources while maintaining effectiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mask-augmented patch embedding for shadows
Transformer-based framework for shadow removal
Simplified design with fewer parameters