๐ค AI Summary
Existing Mamba-based models employ a one-dimensional scanning strategy that neglects the intrinsic two-dimensional spatial structure of images, leading to spatial misalignment and loss of local dependencies in image deblurringโthus compromising the joint optimization of distortion fidelity (e.g., PSNR/SSIM) and perceptual quality. To address this, we propose the Visual State Space Model (VSSM), which introduces an alternating XY-slicing scanning mechanism to explicitly capture cross-dimensional spatial dependencies. Additionally, we design a lightweight multi-scale feature fusion module that enhances perceptual realism without sacrificing reconstruction accuracy. Evaluated on standard deblurring benchmarks, our method achieves a 17% reduction in Kernel Inception Distance (KID) over prior state-of-the-art (SOTA), establishing new SOTA in perceptual quality while maintaining competitive distortion metrics (PSNR/SSIM). The framework further offers improved interpretability and computational efficiency.
๐ Abstract
Deep state-space models (SSMs), like recent Mamba architectures, are emerging as a promising alternative to CNN and Transformer networks. Existing Mamba-based restoration methods process the visual data by leveraging a flatten-and-scan strategy that converts image patches into a 1D sequence before scanning. However, this scanning paradigm ignores local pixel dependencies and introduces spatial misalignment by positioning distant pixels incorrectly adjacent, which reduces local noise-awareness and degrades image sharpness in low-level vision tasks. To overcome these issues, we propose a novel slice-and-scan strategy that alternates scanning along intra- and inter-slices. We further design a new Vision State Space Module (VSSM) for image deblurring, and tackle the inefficiency challenges of the current Mamba-based vision module. Building upon this, we develop XYScanNet, an SSM architecture integrated with a lightweight feature fusion module for enhanced image deblurring. XYScanNet, maintains competitive distortion metrics and significantly improves perceptual performance. Experimental results show that XYScanNet enhances KID by $17%$ compared to the nearest competitor. Our code will be released soon.