🤖 AI Summary
Image super-resolution acceleration faces dual challenges of high computational cost and poor generalization. To address these, we propose a training-free, frequency-driven adaptive sparsity mechanism: edge and texture priors are extracted via Gaussian-blur differencing; binary masks are generated via K-means clustering to enable pixel- or window-level dynamic computation allocation. In CNNs, sparse inference is realized through unfold + 1×1 convolution; in Transformers, mask-guided token selection is introduced, complemented by a tunable dilation strategy for enhanced robustness. The method is plug-and-play for both CNN and Transformer architectures and exhibits strong adaptability to unseen degradations (e.g., noise, compression artifacts). Evaluated on state-of-the-art models—including CARN and SwinIR—it reduces FLOPs by 24–43% while maintaining or improving PSNR and SSIM. Code is publicly available.
📝 Abstract
The primary challenge in accelerating image super-resolution lies in reducing computation while maintaining performance and adaptability. Motivated by the observation that high-frequency regions (e.g., edges and textures) are most critical for reconstruction, we propose a training-free adaptive masking module for acceleration that dynamically focuses computation on these challenging areas. Specifically, our method first extracts high-frequency components via Gaussian blur subtraction and adaptively generates binary masks using K-means clustering to identify regions requiring intensive processing. Our method can be easily integrated with both CNNs and Transformers. For CNN-based architectures, we replace standard $3 imes 3$ convolutions with an unfold operation followed by $1 imes 1$ convolutions, enabling pixel-wise sparse computation guided by the mask. For Transformer-based models, we partition the mask into non-overlapping windows and selectively process tokens based on their average values. During inference, unnecessary pixels or windows are pruned, significantly reducing computation. Moreover, our method supports dilation-based mask adjustment to control the processing scope without retraining, and is robust to unseen degradations (e.g., noise, compression). Extensive experiments on benchmarks demonstrate that our method reduces FLOPs by 24--43% for state-of-the-art models (e.g., CARN, SwinIR) while achieving comparable or better quantitative metrics. The source code is available at https://github.com/shangwei5/AMSR