MAT: Multi-Range Attention Transformer for Efficient Image Super-Resolution

๐Ÿ“… 2024-11-26
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the high computational cost, limited effective receptive field, and insufficient intermediate feature diversity caused by enlarged self-attention windows in conventional Transformers for image super-resolution, this paper proposes a lightweight and efficient Transformer architecture. The method introduces two key components: (1) a Multi-scale Atrous Self-Attention (MA/SMA) mechanism that jointly models local regions and sparse global dependencies; and (2) an MSConvStar module integrating atrous convolution, multi-range self-attention, sparse attention, and local feature extraction. These designs collectively enhance feature representation capability and modeling efficiency. Extensive experiments demonstrate that the proposed approach achieves superior reconstruction accuracy while significantly accelerating inferenceโ€”3.3ร— faster than SRFormer-lightโ€”and outperforms existing state-of-the-art methods in both quantitative metrics and visual quality.

Technology Category

Application Category

๐Ÿ“ Abstract
Image super-resolution (SR) has significantly advanced through the adoption of Transformer architectures. However, conventional techniques aimed at enlarging the self-attention window to capture broader contexts come with inherent drawbacks, especially the significantly increased computational demands. Moreover, the feature perception within a fixed-size window of existing models restricts the effective receptive field (ERF) and the intermediate feature diversity. We demonstrate that a flexible integration of attention across diverse spatial extents can yield significant performance enhancements. In line with this insight, we introduce Multi-Range Attention Transformer (MAT) for SR tasks. MAT leverages the computational advantages inherent in dilation operation, in conjunction with self-attention mechanism, to facilitate both multi-range attention (MA) and sparse multi-range attention (SMA), enabling efficient capture of both regional and sparse global features. Combined with local feature extraction, MAT adeptly capture dependencies across various spatial ranges, improving the diversity and efficacy of its feature representations. We also introduce the MSConvStar module, which augments the model's ability for multi-range representation learning. Comprehensive experiments show that our MAT exhibits superior performance to existing state-of-the-art SR models with remarkable efficiency (~3.3 faster than SRFormer-light).
Problem

Research questions and friction points this paper is trying to address.

Addresses high computational demands in image super-resolution.
Enhances feature diversity and effective receptive field.
Improves efficiency and performance in SR tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Range Attention Transformer for SR
Combines dilation with self-attention mechanism
MSConvStar enhances multi-range representation learning
๐Ÿ”Ž Similar Papers
No similar papers found.
C
Chengxing Xie
School of Computing and Artificial intelligence, Southwest Jiaotong University, China
X
Xiaoming Zhang
DAMO Academy, Alibaba Group, Hangzhou, China
K
Kai Zhang
School of Intelligence Science and Technology, Nanjing University, Suzhou Campus, Suzhou, China
Linze Li
Linze Li
Jiangnan University
Action Recognition Video Understanding
Y
Yuqian Fu
INSAIT, Sofia, Bulgaria
Biao Gong
Biao Gong
Ant Group | Alibaba Group
Generative ModelRetrieval3D Vision
T
Tian-Ping Li