RegionRoute: Regional Style Transfer with Diffusion Model

📅 2026-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of spatially precise style control in diffusion-based image generation, where existing methods struggle to confine stylistic attributes to specific objects without manual masks or post-processing. The authors propose an attention-supervised diffusion framework that aligns the attention maps of style tokens with target region masks during training, enabling single-stage, mask-free local style transfer. The approach introduces a joint loss combining Focus (KL divergence) and Cover (binary cross-entropy) objectives to enhance spatial alignment, alongside a modular LoRA-MoE architecture for efficient multi-style adaptation. Experiments demonstrate that the method produces visually coherent outputs with high regional fidelity, significantly outperforming current techniques in both localized style matching and identity preservation during inference.

Technology Category

Application Category

📝 Abstract
Precise spatial control in diffusion-based style transfer remains challenging. This challenge arises because diffusion models treat style as a global feature and lack explicit spatial grounding of style representations, making it difficult to restrict style application to specific objects or regions. To our knowledge, existing diffusion models are unable to perform true localized style transfer, typically relying on handcrafted masks or multi-stage post-processing that introduce boundary artifacts and limit generalization. To address this, we propose an attention-supervised diffusion framework that explicitly teaches the model where to apply a given style by aligning the attention scores of style tokens with object masks during training. Two complementary objectives, a Focus loss based on KL divergence and a Cover loss using binary cross-entropy, jointly encourage accurate localization and dense coverage. A modular LoRA-MoE design further enables efficient and scalable multi-style adaptation. To evaluate localized stylization, we introduce the Regional Style Editing Score, which measures Regional Style Matching through CLIP-based similarity within the target region and Identity Preservation via masked LPIPS and pixel-level consistency on unedited areas. Experiments show that our method achieves mask-free, single-object style transfer at inference, producing regionally accurate and visually coherent results that outperform existing diffusion-based editing approaches.
Problem

Research questions and friction points this paper is trying to address.

spatial control
localized style transfer
diffusion model
style representation
region-specific editing
Innovation

Methods, ideas, or system contributions that make the work stand out.

attention-supervised diffusion
regional style transfer
LoRA-MoE
mask-free editing
spatial grounding
🔎 Similar Papers
No similar papers found.