SaMam: Style-aware State Space Model for Arbitrary Image Style Transfer

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image style transfer methods—particularly those built upon CNN or Transformer backbones—suffer from high computational complexity and slow inference due to their reliance on global receptive field modeling. To address this, we propose SaMam, an efficient state space model (SSM)-based framework tailored for arbitrary style transfer. Our key contributions are threefold: (1) a novel style-aware Mamba encoder-decoder architecture; (2) a local enhancement module coupled with a zigzag spatial scanning strategy to mitigate intrinsic SSM limitations—including pixel forgetting, channel redundancy, and spatial discontinuity; and (3) a style-conditioned state space modeling mechanism. Experiments demonstrate that SaMam achieves superior qualitative and quantitative performance over current state-of-the-art methods, while maintaining O(N) linear time complexity. It simultaneously improves style fidelity, content preservation, and inference speed.

Technology Category

Application Category

📝 Abstract
Global effective receptive field plays a crucial role for image style transfer (ST) to obtain high-quality stylized results. However, existing ST backbones (e.g., CNNs and Transformers) suffer huge computational complexity to achieve global receptive fields. Recently, the State Space Model (SSM), especially the improved variant Mamba, has shown great potential for long-range dependency modeling with linear complexity, which offers a approach to resolve the above dilemma. In this paper, we develop a Mamba-based style transfer framework, termed SaMam. Specifically, a mamba encoder is designed to efficiently extract content and style information. In addition, a style-aware mamba decoder is developed to flexibly adapt to various styles. Moreover, to address the problems of local pixel forgetting, channel redundancy and spatial discontinuity of existing SSMs, we introduce both local enhancement and zigzag scan. Qualitative and quantitative results demonstrate that our SaMam outperforms state-of-the-art methods in terms of both accuracy and efficiency.
Problem

Research questions and friction points this paper is trying to address.

Achieves global receptive fields with linear complexity
Addresses local pixel forgetting and spatial discontinuity
Enhances style transfer accuracy and efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba-based framework for style transfer
Local enhancement and zigzag scan techniques
Efficient content and style information extraction
🔎 Similar Papers
No similar papers found.
Hongda Liu
Hongda Liu
Sun Yat-sen University
Computer VisionLow-level VisionImage RestorationStyle Transfer
Longguang Wang
Longguang Wang
NUDT
low-level vision3D visiondeep learning
Y
Ye Zhang
The Shenzhen Campus of Sun Yat-Sen University, Sun Yat-Sen University
Z
Ziru Yu
The Shenzhen Campus of Sun Yat-Sen University, Sun Yat-Sen University
Yulan Guo
Yulan Guo
Professor, Sun Yat-sen University
3D VisionMachine LearningRobotics