🤖 AI Summary
To address the high computational complexity and deployment challenges of Transformer-based super-resolution models, this paper proposes an efficient lightweight single-image super-resolution network. The method introduces three key innovations: (1) a novel cross-block attention sharing mechanism tailored for Transformers to reduce redundant self-attention computations; (2) a restructured information distillation module designed to better facilitate hierarchical feature propagation in Transformer architectures; and (3) a lightweight Transformer block coupled with aggressive parameter reduction strategies. The resulting model contains only 300K parameters yet achieves state-of-the-art performance across multiple benchmark datasets—outperforming both CNN- and Transformer-based methods of comparable parameter count. Moreover, it delivers significantly faster inference speed, effectively resolving the long-standing trade-off between model lightness and reconstruction accuracy.
📝 Abstract
Transformer-based Super-Resolution (SR) methods have demonstrated superior performance compared to convolutional neural network (CNN)-based SR approaches due to their capability to capture long-range dependencies. However, their high computational complexity necessitates the development of lightweight approaches for practical use. To address this challenge, we propose the Attention-Sharing Information Distillation (ASID) network, a lightweight SR network that integrates attention-sharing and an information distillation structure specifically designed for Transformer-based SR methods. We modify the information distillation scheme, originally designed for efficient CNN operations, to reduce the computational load of stacked self-attention layers, effectively addressing the efficiency bottleneck. Additionally, we introduce attention-sharing across blocks to further minimize the computational cost of self-attention operations. By combining these strategies, ASID achieves competitive performance with existing SR methods while requiring only around 300K parameters - significantly fewer than existing CNN-based and Transformer-based SR models. Furthermore, ASID outperforms state-of-the-art SR methods when the number of parameters is matched, demonstrating its efficiency and effectiveness. The code and supplementary material are available on the project page.