🤖 AI Summary
Existing Transformer-based methods for unified restoration of natural images degraded by diverse factors (e.g., haze, rain/snow, blur, low light) suffer from high model complexity, poor generalization across degradation types, and oversimplified prior modeling. Method: We propose a lightweight multi-domain collaborative Transformer framework featuring a novel spatial-wavelet-Fourier hybrid token mixer that explicitly captures cross-degradation common priors; it further incorporates systematic multi-scale feature fusion and cross-task joint training. Contribution/Results: Our method achieves state-of-the-art performance across ten diverse image restoration tasks, reducing parameter count by 32% and inference latency by 41% compared to prior art, thereby significantly improving the holistic trade-off among accuracy, efficiency, and generalization.
📝 Abstract
Due to adverse atmospheric and imaging conditions, natural images suffer from various degradation phenomena. Consequently, image restoration has emerged as a key solution and garnered substantial attention. Although recent Transformer architectures have demonstrated impressive success across various restoration tasks, their considerable model complexity poses significant challenges for both training and real-time deployment. Furthermore, instead of investigating the commonalities among different degradations, most existing restoration methods focus on modifying Transformer under limited restoration priors. In this work, we first review various degradation phenomena under multi-domain perspective, identifying common priors. Then, we introduce a novel restoration framework, which integrates multi-domain learning into Transformer. Specifically, in Token Mixer, we propose a Spatial-Wavelet-Fourier multi-domain structure that facilitates local-region-global multi-receptive field modeling to replace vanilla self-attention. Additionally, in Feed-Forward Network, we incorporate multi-scale learning to fuse multi-domain features at different resolutions. Comprehensive experimental results across ten restoration tasks, such as dehazing, desnowing, motion deblurring, defocus deblurring, rain streak/raindrop removal, cloud removal, shadow removal, underwater enhancement and low-light enhancement, demonstrate that our proposed model outperforms state-of-the-art methods and achieves a favorable trade-off among restoration performance, parameter size, computational cost and inference latency. The code is available at: https://github.com/deng-ai-lab/SWFormer.