๐ค AI Summary
Existing Mamba-based super-resolution methods lack fine-grained multi-scale modeling capability, limiting feature representation efficiency. To address this, we propose a lightweight multi-scale receptive field interaction framework that synergistically integrates windowed self-attention with a progressive Mamba mechanism, enabling joint global contextual awareness and local detail preservation under linear computational complexity. Furthermore, we design an adaptive high-frequency refinement module to ensure smooth multi-scale feature transitions and precise recovery of high-frequency details. Our approach seamlessly unifies the long-range modeling strength of Transformers with the efficient state-space properties of Mamba. Extensive experiments demonstrate that the method consistently outperforms state-of-the-art Transformer- and Mamba-based baselines across multiple benchmarks, achieving superior PSNR and SSIM scores with significantly lower computational overheadโthus striking an optimal balance between accuracy and efficiency.
๐ Abstract
Recently, Mamba-based super-resolution (SR) methods have demonstrated the ability to capture global receptive fields with linear complexity, addressing the quadratic computational cost of Transformer-based SR approaches. However, existing Mamba-based methods lack fine-grained transitions across different modeling scales, which limits the efficiency of feature representation. In this paper, we propose T-PMambaSR, a lightweight SR framework that integrates window-based self-attention with Progressive Mamba. By enabling interactions among receptive fields of different scales, our method establishes a fine-grained modeling paradigm that progressively enhances feature representation with linear complexity. Furthermore, we introduce an Adaptive High-Frequency Refinement Module (AHFRM) to recover high-frequency details lost during Transformer and Mamba processing. Extensive experiments demonstrate that T-PMambaSR progressively enhances the model's receptive field and expressiveness, yielding better performance than recent Transformer- or Mamba-based methods while incurring lower computational cost. Our codes will be released after acceptance.