🤖 AI Summary
This work addresses the limitation of existing blind-spot networks in real-world sRGB image denoising, which struggle to model spatially correlated noise introduced by camera ISP pipelines—particularly demosaicing—due to their assumption of noise independence. To overcome this, the authors propose a novel blind-spot network that introduces triangular masked convolution for the first time, enabling the construction of a diamond-shaped blind region aligned with the structural characteristics of demosaicing-induced noise. This design allows self-supervised denoising at full resolution without requiring downsampling or post-processing. Furthermore, multi-prediction knowledge distillation is integrated to enhance the performance of a lightweight U-Net architecture. The method achieves state-of-the-art self-supervised results on multiple real-image denoising benchmarks, significantly outperforming current approaches.
📝 Abstract
Blind-spot networks (BSNs) enable self-supervised image denoising by preventing access to the target pixel, allowing clean signal estimation without ground-truth supervision. However, this approach assumes pixel-wise noise independence, which is violated in real-world sRGB images due to spatially correlated noise from the camera's image signal processing (ISP) pipeline. While several methods employ downsampling to decorrelate noise, they alter noise statistics and limit the network's ability to utilize full contextual information. In this paper, we propose the Triangular-Masked Blind-Spot Network (TM-BSN), a novel blind-spot architecture that accurately models the spatial correlation of real sRGB noise. This correlation originates from demosaicing, where each pixel is reconstructed from neighboring samples with spatially decaying weights, resulting in a diamond-shaped pattern. To align the receptive field with this geometry, we introduce a triangular-masked convolution that restricts the kernel to its upper-triangular region, creating a diamond-shaped blind spot at the original resolution. This design excludes correlated pixels while fully leveraging uncorrelated context, eliminating the need for downsampling or post-processing. Furthermore, we use knowledge distillation to transfer complementary knowledge from multiple blind-spot predictions into a lightweight U-Net, improving both accuracy and efficiency. Extensive experiments on real-world benchmarks demonstrate that our method achieves state-of-the-art performance, significantly outperforming existing self-supervised approaches. Our code is available at https://github.com/parkjun210/TM-BSN.