🤖 AI Summary
Under-display camera (UDC) imaging suffers from severe artifacts—including Moiré patterns, blur, and color distortion—due to optical attenuation and diffraction induced by the display panel. Conventional CNNs and standard Vision Transformers (ViTs) yield suboptimal restoration performance under such complex, spatially varying degradations. To address this, we propose SGSFormer: a novel instance segmentation-guided sparse Transformer architecture. It is the first to explicitly embed instance-level degradation priors into the self-attention mechanism, enabling precise localization and adaptive modeling of artifact-prone regions. By dynamically sparsifying attention computation over non-degraded areas, SGSFormer preserves global contextual modeling capability while significantly suppressing noise and redundant computation. Evaluated on multiple UDC benchmarks, our end-to-end framework consistently outperforms state-of-the-art CNN- and ViT-based methods, achieving substantial gains in PSNR (+1.27 dB on average) and SSIM (+0.021), and effectively mitigating characteristic UDC degradations.
📝 Abstract
Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel. However, the current implementation of UDC causes serious degradation. The incident light required for camera imaging undergoes attenuation and diffraction when passing through the display panel, leading to various artifacts in UDC imaging. Presently, the prevailing UDC image restoration methods predominantly utilize convolutional neural network architectures, whereas Transformer-based methods have exhibited superior performance in the majority of image restoration tasks. This is attributed to the Transformer's capability to sample global features for the local reconstruction of images, thereby achieving high-quality image restoration. In this paper, we observe that when using the Vision Transformer for UDC degraded image restoration, the global attention samples a large amount of redundant information and noise. Furthermore, compared to the ordinary Transformer employing dense attention, the Transformer utilizing sparse attention can alleviate the adverse impact of redundant information and noise. Building upon this discovery, we propose a Segmentation Guided Sparse Transformer method (SGSFormer) for the task of restoring high-quality images from UDC degraded images. Specifically, we utilize sparse self-attention to filter out redundant information and noise, directing the model's attention to focus on the features more relevant to the degraded regions in need of reconstruction. Moreover, we integrate the instance segmentation map as prior information to guide the sparse self-attention in filtering and focusing on the correct regions.