🤖 AI Summary
To address the challenge of balancing model complexity and reconstruction quality in lightweight image super-resolution, this paper proposes an SSIM-driven unrolled network. Methodologically: (1) we formulate an iterative reconstruction framework grounded in optimization unrolling theory, explicitly embedding SSIM as a structural prior; (2) we design a multi-scale gated module (MSGM) and an efficient sparse attention module (ESAM) to enhance multi-scale contextual modeling under low computational overhead; (3) we introduce a Mixture-of-Experts Feature Selector (MoE-FS) for adaptive, multi-stage feature fusion. Evaluated on multiple benchmarks, our method achieves state-of-the-art performance—reducing parameter count by 32% and memory footprint by 28%—while maintaining real-time inference capability. This yields a significantly improved accuracy-efficiency trade-off.
📝 Abstract
Major efforts in data-driven image super-resolution (SR) primarily focus on expanding the receptive field of the model to better capture contextual information. However, these methods are typically implemented by stacking deeper networks or leveraging transformer-based attention mechanisms, which consequently increases model complexity. In contrast, model-driven methods based on the unfolding paradigm show promise in improving performance while effectively maintaining model compactness through sophisticated module design. Based on these insights, we propose a Structural Similarity-Inspired Unfolding (SSIU) method for efficient image SR. This method is designed through unfolding an SR optimization function constrained by structural similarity, aiming to combine the strengths of both data-driven and model-driven approaches. Our model operates progressively following the unfolding paradigm. Each iteration consists of multiple Mixed-Scale Gating Modules (MSGM) and an Efficient Sparse Attention Module (ESAM). The former implements comprehensive constraints on features, including a structural similarity constraint, while the latter aims to achieve sparse activation. In addition, we design a Mixture-of-Experts-based Feature Selector (MoE-FS) that fully utilizes multi-level feature information by combining features from different steps. Extensive experiments validate the efficacy and efficiency of our unfolding-inspired network. Our model outperforms current state-of-the-art models, boasting lower parameter counts and reduced memory consumption. Our code will be available at: https://github.com/eezkni/SSIU