HAM: A Training-Free Style Transfer Approach via Heterogeneous Attention Modulation for Diffusion Models

📅 2026-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge in diffusion-based style transfer of simultaneously preserving content identity and expressing target style. The authors propose a training-free, plug-and-play method that leverages a pre-trained diffusion model and introduces a heterogeneous attention modulation mechanism composed of style-aware noise initialization, Global Attention Regulation (GAR), and Local Attention Transplantation (LAT). Guided by either images or text prompts, this approach effectively disentangles style from content during the generation process. Extensive experiments demonstrate that the proposed method outperforms existing techniques across multiple quantitative metrics, while both qualitative and quantitative results confirm its capability to achieve high-fidelity style transfer without compromising the structural integrity of the original content.

Technology Category

Application Category

📝 Abstract
Diffusion models have demonstrated remarkable performance in image generation, particularly within the domain of style transfer. Prevailing style transfer approaches typically leverage pre-trained diffusion models' robust feature extraction capabilities alongside external modular control pathways to explicitly impose style guidance signals. However, these methods often fail to capture complex style reference or retain the identity of user-provided content images, thus falling into the trap of style-content balance. Thus, we propose a training-free style transfer approach via $\textbf{h}$eterogeneous $\textbf{a}$ttention $\textbf{m}$odulation ($\textbf{HAM}$) to protect identity information during image/text-guided style reference transfer, thereby addressing the style-content trade-off challenge. Specifically, we first introduces style noise initialization to initialize latent noise for diffusion. Then, during the diffusion process, it innovatively employs HAM for different attention mechanisms, including Global Attention Regulation (GAR) and Local Attention Transplantation (LAT), which better preserving the details of the content image while capturing complex style references. Our approach is validated through a series of qualitative and quantitative experiments, achieving state-of-the-art performance on multiple quantitative metrics.
Problem

Research questions and friction points this paper is trying to address.

style transfer
diffusion models
style-content balance
identity preservation
training-free
Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous Attention Modulation
Training-Free Style Transfer
Diffusion Models
Global Attention Regulation
Local Attention Transplantation
🔎 Similar Papers
No similar papers found.