XYScanNet: An Interpretable State Space Model for Perceptual Image Deblurring

📅 2024-12-13

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

Existing Mamba-based models employ a one-dimensional scanning strategy that neglects the intrinsic two-dimensional spatial structure of images, leading to spatial misalignment and loss of local dependencies in image deblurring—thus compromising the joint optimization of distortion fidelity (e.g., PSNR/SSIM) and perceptual quality. To address this, we propose the Visual State Space Model (VSSM), which introduces an alternating XY-slicing scanning mechanism to explicitly capture cross-dimensional spatial dependencies. Additionally, we design a lightweight multi-scale feature fusion module that enhances perceptual realism without sacrificing reconstruction accuracy. Evaluated on standard deblurring benchmarks, our method achieves a 17% reduction in Kernel Inception Distance (KID) over prior state-of-the-art (SOTA), establishing new SOTA in perceptual quality while maintaining competitive distortion metrics (PSNR/SSIM). The framework further offers improved interpretability and computational efficiency.

Technology Category

Application Category

📝 Abstract

Deep state-space models (SSMs), like recent Mamba architectures, are emerging as a promising alternative to CNN and Transformer networks. Existing Mamba-based restoration methods process the visual data by leveraging a flatten-and-scan strategy that converts image patches into a 1D sequence before scanning. However, this scanning paradigm ignores local pixel dependencies and introduces spatial misalignment by positioning distant pixels incorrectly adjacent, which reduces local noise-awareness and degrades image sharpness in low-level vision tasks. To overcome these issues, we propose a novel slice-and-scan strategy that alternates scanning along intra- and inter-slices. We further design a new Vision State Space Module (VSSM) for image deblurring, and tackle the inefficiency challenges of the current Mamba-based vision module. Building upon this, we develop XYScanNet, an SSM architecture integrated with a lightweight feature fusion module for enhanced image deblurring. XYScanNet, maintains competitive distortion metrics and significantly improves perceptual performance. Experimental results show that XYScanNet enhances KID by $17%$ compared to the nearest competitor. Our code will be released soon.

Problem

Research questions and friction points this paper is trying to address.

Mamba Recovery Method

Image Blurring

Pixel Position Relationship

Innovation

Methods, ideas, or system contributions that make the work stand out.

Slice and Scan Strategy

Visual State Space Module (VSSM)

Enhanced Deblurring Performance

🔎 Similar Papers

DeblurDiNAT: A Compact Model with Exceptional Generalization and Visual Fidelity on Unseen Domains