🤖 AI Summary
Existing artistic style transfer methods struggle to simultaneously model local details and global semantics, often resulting in artifacts, stylistic inconsistencies, or inefficient inference. This work proposes StyMam, the first style transfer generator based on the Mamba architecture, which integrates a residual dual-path strip scanning mechanism to capture fine-grained textures and a channel-reweighted spatial attention module to model long-range global dependencies. Embedded within a GAN framework, StyMam enables efficient training while preserving content structure. The proposed method achieves significant improvements in both generation quality and inference speed, outperforming state-of-the-art approaches across multiple quantitative and qualitative metrics.
📝 Abstract
Image style transfer aims to integrate the visual patterns of a specific artistic style into a content image while preserving its content structure. Existing methods mainly rely on the generative adversarial network (GAN) or stable diffusion (SD). GAN-based approaches using CNNs or Transformers struggle to jointly capture local and global dependencies, leading to artifacts and disharmonious patterns. SD-based methods reduce such issues but often fail to preserve content structures and suffer from slow inference. To address these issues, we revisit GAN and propose a mamba-based generator, termed as StyMam, to produce high-quality stylized images without introducing artifacts and disharmonious patterns. Specifically, we introduce a mamba-based generator with a residual dual-path strip scanning mechanism and a channel-reweighted spatial attention module. The former efficiently captures local texture features, while the latter models global dependencies. Finally, extensive qualitative and quantitative experiments demonstrate that the proposed method outperforms state-of-the-art algorithms in both quality and speed.