🤖 AI Summary
Existing Mamba-based image super-resolution methods rely on fixed-grid scanning, which disrupts spatial semantic structures and introduces reconstruction artifacts. To address this limitation, this work proposes SP-MoMamba, the first approach to incorporate superpixels as fundamental units into state space models, thereby establishing a semantic-aware mixture-of-experts architecture. The method unifies superpixel-driven state space modeling, a multi-scale superpixel-based mixture-of-experts mechanism, and a local spatial modulation module to jointly preserve semantic coherence, capture multi-scale features, and recover fine details. Extensive experiments demonstrate that SP-MoMamba significantly outperforms current efficient super-resolution approaches on standard benchmarks, achieving a superior trade-off between reconstruction quality and computational efficiency.
📝 Abstract
State space models (SSMs) have emerged as a powerful paradigm for efficient single-image super-resolution (SR) due to their linear complexity and long-range modeling capabilities. However, existing Mamba-based methods typically rely on data-agnostic rigid scanning, which reshapes 2D images into 1D sequences over a fixed grid, inevitably disrupting spatial-semantic topology and introducing artifacts. Inspired by the \textbf{Gestalt perceptual grouping theory}, we propose \textbf{SP-MoMamba}, a superpixel-driven mixture of state space experts designed for content-aware SR. Our core idea is to transform the traditional rigid scanning into a \textbf{semantic-level interaction} by treating superpixels as fundamental units. Specifically, we introduce the \textbf{Superpixel-driven State Space Model (SP-SSM)}, which compresses semantically homogeneous regions into high-order tokens to preserve global topological consistency. To address the conflict between fixed scanning scales and diverse semantic granularities, we develop the \textbf{Multi-Scale Superpixel Mixture of State Space Experts (MSS-MoE)}. This module utilizes a dynamic routing mechanism to adaptively assign scale-specific experts, effectively capturing multi-scale textures while reducing computational redundancy. Furthermore, to prevent the loss of high-frequency details during global abstraction, we introduce a \textbf{Local Spatial Modulation Expert (LSME)} to complement the global modeling, ensuring a precise reconstruction of sharp edges and fine structures. Extensive experiments on standard benchmarks demonstrate that SP-MoMamba achieves superior reconstruction fidelity and a more favorable efficiency-performance trade-off compared to state-of-the-art efficient SR methods.