🤖 AI Summary
Existing attention-based arbitrary style transfer methods suffer from neglecting global stylistic structure due to local point-wise matching and incur quadratic computational complexity. To address these limitations, we propose the Holistic Style Injector (HSI) module, which replaces local attention with global style representation: first, extracting a compact global encoding from the style image; second, establishing a semantic-similarity-guided dual-relational modeling mechanism; and third, achieving efficient style injection via element-wise feature modulation and lightweight attention. Crucially, HSI eliminates dense matrix multiplication operations, reducing computational complexity to linear while jointly optimizing content structure preservation and style fidelity. Extensive experiments demonstrate that HSI consistently outperforms state-of-the-art methods both qualitatively and quantitatively—particularly excelling in large-image style transfer, where it delivers显著 improvements in visual quality and inference efficiency.
📝 Abstract
Attention-based arbitrary style transfer methods have gained significant attention recently due to their impressive ability to synthesize style details. However, the point-wise matching within the attention mechanism may overly focus on local patterns such that neglect the remarkable global features of style images. Additionally, when processing large images, the quadratic complexity of the attention mechanism will bring high computational load. To alleviate above problems, we propose Holistic Style Injector (HSI), a novel attention-style transformation module to deliver artistic expression of target style. Specifically, HSI performs stylization only based on global style representation that is more in line with the characteristics of style transfer, to avoid generating local disharmonious patterns in stylized images. Moreover, we propose a dual relation learning mechanism inside the HSI to dynamically render images by leveraging semantic similarity in content and style, ensuring the stylized images preserve the original content and improve style fidelity. Note that the proposed HSI achieves linear computational complexity because it establishes feature mapping through element-wise multiplication rather than matrix multiplication. Qualitative and quantitative results demonstrate that our method outperforms state-of-the-art approaches in both effectiveness and efficiency.