Progressive Focused Transformer for Single Image Super-Resolution

📅 2025-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address computational redundancy and performance bottlenecks in Transformer-based single-image super-resolution (SISR) caused by global self-attention, this paper proposes Progressive Focusing Attention (PFA). PFA dynamically identifies and focuses on salient tokens across multi-scale feature layers by establishing cross-layer attention map correlations, jointly enabling attention sparsification, token importance prediction, and dynamic similarity-based pruning. It is the first work to introduce a layer-wise focusing paradigm into SISR Transformers, preserving modeling capacity while significantly improving efficiency. On multiple standard SISR benchmarks, PFA achieves state-of-the-art (SOTA) performance: PSNR gains of 0.15–0.32 dB over prior methods, with approximately 37% reduction in FLOPs—demonstrating superior accuracy–efficiency trade-offs.

Technology Category

Application Category

📝 Abstract
Transformer-based methods have achieved remarkable results in image super-resolution tasks because they can capture non-local dependencies in low-quality input images. However, this feature-intensive modeling approach is computationally expensive because it calculates the similarities between numerous features that are irrelevant to the query features when obtaining attention weights. These unnecessary similarity calculations not only degrade the reconstruction performance but also introduce significant computational overhead. How to accurately identify the features that are important to the current query features and avoid similarity calculations between irrelevant features remains an urgent problem. To address this issue, we propose a novel and effective Progressive Focused Transformer (PFT) that links all isolated attention maps in the network through Progressive Focused Attention (PFA) to focus attention on the most important tokens. PFA not only enables the network to capture more critical similar features, but also significantly reduces the computational cost of the overall network by filtering out irrelevant features before calculating similarities. Extensive experiments demonstrate the effectiveness of the proposed method, achieving state-of-the-art performance on various single image super-resolution benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Reduce computational cost in Transformer-based super-resolution
Filter irrelevant features before similarity calculations
Improve attention on critical tokens for better performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Focused Transformer for efficient super-resolution
Progressive Focused Attention to prioritize important tokens
Reduces computational cost by filtering irrelevant features
🔎 Similar Papers
No similar papers found.
W
Wei Long
University of Electronic Science and Technology of China
X
Xingyu Zhou
University of Electronic Science and Technology of China
Leheng Zhang
Leheng Zhang
University of Electronic Science and Technology of China
image restoration
Shuhang Gu
Shuhang Gu
University of Electronic Science and Technology of China
image processingpattern recognitioncomputer vision