C2D-ISR: Optimizing Attention-based Image Super-resolution from Continuous to Discrete Scales

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing attention-based super-resolution models suffer from discrete-scale training constraints, limited cross-scale modeling capability, and low inference efficiency. To address these issues, we propose a Continuous-to-Discrete (C2D) optimization framework: first, continuous-scale pretraining to enhance multi-scale representation learning; second, discrete-scale fine-tuning to balance reconstruction accuracy and deployment compatibility. We innovatively extend hierarchical encoding to mainstream lightweight attention architectures—including SwinIR-L, SRFormer-L, and MambaIRv2-L—enabling effective cross-scale feature aggregation and accelerated inference. Extensive experiments demonstrate that our method achieves an average PSNR gain of 0.2 dB across multiple benchmarks while reducing computational complexity by up to 11%, significantly outperforming state-of-the-art approaches such as HiT.

Technology Category

Application Category

📝 Abstract
In recent years, attention mechanisms have been exploited in single image super-resolution (SISR), achieving impressive reconstruction results. However, these advancements are still limited by the reliance on simple training strategies and network architectures designed for discrete up-sampling scales, which hinder the model's ability to effectively capture information across multiple scales. To address these limitations, we propose a novel framework, extbf{C2D-ISR}, for optimizing attention-based image super-resolution models from both performance and complexity perspectives. Our approach is based on a two-stage training methodology and a hierarchical encoding mechanism. The new training methodology involves continuous-scale training for discrete scale models, enabling the learning of inter-scale correlations and multi-scale feature representation. In addition, we generalize the hierarchical encoding mechanism with existing attention-based network structures, which can achieve improved spatial feature fusion, cross-scale information aggregation, and more importantly, much faster inference. We have evaluated the C2D-ISR framework based on three efficient attention-based backbones, SwinIR-L, SRFormer-L and MambaIRv2-L, and demonstrated significant improvements over the other existing optimization framework, HiT, in terms of super-resolution performance (up to 0.2dB) and computational complexity reduction (up to 11%). The source code will be made publicly available at www.github.com.
Problem

Research questions and friction points this paper is trying to address.

Optimize attention-based image super-resolution models.
Address limitations in discrete up-sampling scales.
Improve performance and reduce computational complexity.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous-scale training for discrete scale models
Hierarchical encoding mechanism for feature fusion
Improved spatial feature and cross-scale aggregation
🔎 Similar Papers
No similar papers found.
Y
Yuxuan Jiang
Visual Information Laboratory, University of Bristol, Bristol, BS1 5DD, UK
C
Chengxi Zeng
Visual Information Laboratory, University of Bristol, Bristol, BS1 5DD, UK
S
Siyue Teng
Visual Information Laboratory, University of Bristol, Bristol, BS1 5DD, UK
F
Fan Zhang
Visual Information Laboratory, University of Bristol, Bristol, BS1 5DD, UK
Xiaoqing Zhu
Xiaoqing Zhu
Netflix
video codec researchmultimedia networkingmachine learningwireless networks
J
Joel Sole
Netflix Inc., Los Gatos, CA, USA, 95032
D
David R. Bull
Visual Information Laboratory, University of Bristol, Bristol, BS1 5DD, UK