Frames2Residual: Spatiotemporal Decoupling for Self-Supervised Video Denoising

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the challenge in existing self-supervised video denoising methods, where masking the center pixel in blind-spot networks compromises both temporal consistency and spatial detail preservation. To overcome this limitation, we propose a decoupled spatiotemporal two-stage self-supervised framework. In the first stage, a frame-level blind-spot strategy is employed to learn temporal consistency and generate temporally stable anchor frames. In the second stage, high-frequency spatial residuals are recovered in a non-blind manner based on these anchors, effectively preserving fine texture details. By explicitly separating blind temporal modeling from non-blind spatial restoration, our approach circumvents the inherent constraint of conventional blind-spot networks that render the center pixel unobservable. Extensive experiments demonstrate that our method significantly outperforms existing self-supervised approaches on both sRGB and RAW video benchmarks, achieving superior denoising performance and visual quality.

Technology Category

Application Category

📝 Abstract

Self-supervised video denoising methods typically extend image-based frameworks into the temporal dimension, yet they often struggle to integrate inter-frame temporal consistency with intra-frame spatial specificity. Existing Video Blind-Spot Networks (BSNs) require noise independence by masking the center pixel, this constraint prevents the use of spatial evidence for texture recovery, thereby severing spatiotemporal correlations and causing texture loss. To address this, we propose Frames2Residual (F2R), a spatiotemporal decoupling framework that explicitly divides self-supervised training into two distinct stages: blind temporal consistency modeling and non-blind spatial texture recovery. In Stage 1, a blind temporal estimator learns inter-frame consistency using a frame-wise blind strategy, producing a temporally consistent anchor. In Stage 2, a non-blind spatial refiner leverages this anchor to safely reintroduce the center frame and recover intra-frame high-frequency spatial residuals while preserving temporal stability. Extensive experiments demonstrate that our decoupling strategy allows F2R to outperform existing self-supervised methods on both sRGB and raw video benchmarks.

Problem

Research questions and friction points this paper is trying to address.

video denoising

self-supervised learning

spatiotemporal correlation

blind-spot network

texture recovery

Innovation

Methods, ideas, or system contributions that make the work stand out.

spatiotemporal decoupling

self-supervised video denoising

blind-spot network

temporal consistency

spatial texture recovery

🔎 Similar Papers

No similar papers found.