F2T2-HiT: A U-Shaped FFT Transformer and Hierarchical Transformer for Reflection Removal

📅 2025-06-05
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Single-image reflection removal (SIRR) aims to recover a reflection-free background from a single image corrupted by glass reflections, yet remains an open challenge due to the high variability in reflection intensity, morphology, and spatial distribution. This paper proposes a frequency-spatial collaborative modeling framework: it introduces, for the first time, global frequency-domain priors into SIRR via an FFT Transformer that captures periodic spectral patterns inherent to reflections; further, it integrates hierarchical Transformers with a U-Net–based multi-scale encoder-decoder architecture to enable frequency-domain disentanglement and spatially adaptive separation of reflection and background components. The method achieves state-of-the-art performance on three benchmarks—SIR2, RealBlur-J, and Reflections-Real—yielding significant PSNR and SSIM improvements. Notably, it demonstrates superior robustness under challenging conditions, including strong reflections, large reflection coverage, and non-uniform reflection distributions.

Technology Category

Application Category

📝 Abstract
Single Image Reflection Removal (SIRR) technique plays a crucial role in image processing by eliminating unwanted reflections from the background. These reflections, often caused by photographs taken through glass surfaces, can significantly degrade image quality. SIRR remains a challenging problem due to the complex and varied reflections encountered in real-world scenarios. These reflections vary significantly in intensity, shapes, light sources, sizes, and coverage areas across the image, posing challenges for most existing methods to effectively handle all cases. To address these challenges, this paper introduces a U-shaped Fast Fourier Transform Transformer and Hierarchical Transformer (F2T2-HiT) architecture, an innovative Transformer-based design for SIRR. Our approach uniquely combines Fast Fourier Transform (FFT) Transformer blocks and Hierarchical Transformer blocks within a UNet framework. The FFT Transformer blocks leverage the global frequency domain information to effectively capture and separate reflection patterns, while the Hierarchical Transformer blocks utilize multi-scale feature extraction to handle reflections of varying sizes and complexities. Extensive experiments conducted on three publicly available testing datasets demonstrate state-of-the-art performance, validating the effectiveness of our approach.
Problem

Research questions and friction points this paper is trying to address.

Removing complex reflections from single images
Handling varied reflection intensities and shapes
Improving image quality in real-world scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

U-shaped FFT Transformer for reflection separation
Hierarchical Transformer for multi-scale feature extraction
UNet framework combining FFT and Transformer blocks