Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-Resolution

📅 2025-01-27

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

To address the high computational complexity and deployment challenges of Transformer-based super-resolution models, this paper proposes an efficient lightweight single-image super-resolution network. The method introduces three key innovations: (1) a novel cross-block attention sharing mechanism tailored for Transformers to reduce redundant self-attention computations; (2) a restructured information distillation module designed to better facilitate hierarchical feature propagation in Transformer architectures; and (3) a lightweight Transformer block coupled with aggressive parameter reduction strategies. The resulting model contains only 300K parameters yet achieves state-of-the-art performance across multiple benchmark datasets—outperforming both CNN- and Transformer-based methods of comparable parameter count. Moreover, it delivers significantly faster inference speed, effectively resolving the long-standing trade-off between model lightness and reconstruction accuracy.

Technology Category

Application Category

📝 Abstract

Transformer-based Super-Resolution (SR) methods have demonstrated superior performance compared to convolutional neural network (CNN)-based SR approaches due to their capability to capture long-range dependencies. However, their high computational complexity necessitates the development of lightweight approaches for practical use. To address this challenge, we propose the Attention-Sharing Information Distillation (ASID) network, a lightweight SR network that integrates attention-sharing and an information distillation structure specifically designed for Transformer-based SR methods. We modify the information distillation scheme, originally designed for efficient CNN operations, to reduce the computational load of stacked self-attention layers, effectively addressing the efficiency bottleneck. Additionally, we introduce attention-sharing across blocks to further minimize the computational cost of self-attention operations. By combining these strategies, ASID achieves competitive performance with existing SR methods while requiring only around 300K parameters - significantly fewer than existing CNN-based and Transformer-based SR models. Furthermore, ASID outperforms state-of-the-art SR methods when the number of parameters is matched, demonstrating its efficiency and effectiveness. The code and supplementary material are available on the project page.

Problem

Research questions and friction points this paper is trying to address.

Lightweight Super-Resolution

Transformer-based SR Optimization

Computational Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

ASID Transformer

Attention Sharing

Information Distillation

🔎 Similar Papers

No similar papers found.