SCSA: A Plug-and-Play Semantic Continuous-Sparse Attention for Arbitrary Semantic Style Transfer

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Existing attention-based arbitrary style transfer methods (CNN-, Transformer-, or Diffusion-based) suffer from region-level style misalignment when content and style images share semantic consistency, primarily due to neglecting the coupling between local texture patterns and semantic regions. Method: We propose a plug-and-play Semantic Continuous–Sparse Attention (SCSA) mechanism that explicitly decouples modeling into two complementary components: (i) continuous holistic style representation within semantically coherent regions—guided by semantic segmentation and realized via continuous attention; and (ii) sparse local texture representation—obtained through semantic-constrained sparse similarity retrieval. These components are jointly optimized to enforce semantic alignment while preserving fine-grained texture fidelity. Results: On multiple benchmarks, our method achieves a 12.6% reduction in FID and a 23.4% improvement in semantic style matching accuracy, significantly outperforming diverse state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

Attention-based arbitrary style transfer methods, including CNN-based, Transformer-based, and Diffusion-based, have flourished and produced high-quality stylized images. However, they perform poorly on the content and style images with the same semantics, i.e., the style of the corresponding semantic region of the generated stylized image is inconsistent with that of the style image. We argue that the root cause lies in their failure to consider the relationship between local regions and semantic regions. To address this issue, we propose a plug-and-play semantic continuous-sparse attention, dubbed SCSA, for arbitrary semantic style transfer -- each query point considers certain key points in the corresponding semantic region. Specifically, semantic continuous attention ensures each query point fully attends to all the continuous key points in the same semantic region that reflect the overall style characteristics of that region; Semantic sparse attention allows each query point to focus on the most similar sparse key point in the same semantic region that exhibits the specific stylistic texture of that region. By combining the two modules, the resulting SCSA aligns the overall style of the corresponding semantic regions while transferring the vivid textures of these regions. Qualitative and quantitative results prove that SCSA enables attention-based arbitrary style transfer methods to produce high-quality semantic stylized images.

Problem

Research questions and friction points this paper is trying to address.

Inconsistent style transfer in semantic regions

Lack of local and semantic region relationship consideration

Need for high-quality semantic style transfer

Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug-and-play semantic continuous-sparse attention mechanism

Aligns overall style and vivid textures in semantic regions

Enhances arbitrary style transfer with semantic consistency

🔎 Similar Papers

StyleShot: A Snapshot on Any Style