Face Mask Removal with Region-attentive Face Inpainting

📅 2024-09-10

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

The COVID-19 pandemic has severely degraded face recognition performance due to facial occlusion by masks, while practical demands for mask removal—e.g., for social interaction and media editing—remain pressing. To address this, we propose a region-aware generative face inpainting method that faithfully reconstructs masked facial regions while preserving identity consistency and visual fidelity. Our approach is built upon a GAN framework incorporating multi-scale feature fusion and region-weighted losses. Key contributions include: (1) a Multi-scale Channel-Spatial Attention Module (M-CSAM) to enhance local structural modeling; (2) a mask-region-focused supervision strategy to improve reconstruction accuracy within occluded areas; and (3) Masked-Faces, a synthetic dataset comprising five realistic mask types. Extensive experiments demonstrate significant improvements over baselines in SSIM, PSNR, and L1 metrics; qualitative results confirm more natural, identity-preserving, and structurally coherent reconstructions.

Technology Category

Application Category

📝 Abstract

During the COVID-19 pandemic, face masks have become ubiquitous in our lives. Face masks can cause some face recognition models to fail since they cover significant portion of a face. In addition, removing face masks from captured images or videos can be desirable, e.g., for better social interaction and for image/video editing and enhancement purposes. Hence, we propose a generative face inpainting method to effectively recover/reconstruct the masked part of a face. Face inpainting is more challenging compared to traditional inpainting, since it requires high fidelity while maintaining the identity at the same time. Our proposed method includes a Multi-scale Channel-Spatial Attention Module (M-CSAM) to mitigate the spatial information loss and learn the inter- and intra-channel correlation. In addition, we introduce an approach enforcing the supervised signal to focus on masked regions instead of the whole image. We also synthesize our own Masked-Faces dataset from the CelebA dataset by incorporating five different types of face masks, including surgical mask, regular mask and scarves, which also cover the neck area. The experimental results show that our proposed method outperforms different baselines in terms of structural similarity index measure, peak signal-to-noise ratio and l1 loss, while also providing better outputs qualitatively. The code will be made publicly available. Code is available at GitHub.

Problem

Research questions and friction points this paper is trying to address.

Recovering masked facial regions for recognition

Enhancing image/video quality by mask removal

Preserving identity fidelity during face inpainting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-scale Channel-Spatial Attention Module

Supervised signal focusing on masked regions

Synthesized Masked-Faces dataset from CelebA

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Sr. Research Engineer/Scientist (all levels), Efficient Models

TikTok

San Jose, California

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)