AttnMod: Attention-Based New Art Styles

📅 2024-09-16

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Diffusion models struggle to generate artistic styles not explicitly described in text prompts. To address this, we propose a cross-attention intervention method that requires neither prompt modification nor model fine-tuning. By treating the text-guided cross-attention layers in the UNet as editable interfaces, our approach introduces dynamic attention masking and weight remapping to enable fine-grained modulation of attention maps during denoising. This is the first method to achieve intent-driven zero-shot artistic style synthesis—generating novel, previously unparameterized styles such as contour distortion, color diffusion, and material concretization—while preserving semantic fidelity. Unlike prompt engineering or model adaptation paradigms, our technique transcends inherent constraints on stylistic expressivity, offering a lightweight, efficient, and interpretable framework for controllable image generation.

Technology Category

Application Category

📝 Abstract

Imagine a human artist looking at the generated photo of a diffusion model, and hoping to create a painting out of it. There could be some feature of the object in the photo that the artist wants to emphasize, some color to disperse, some silhouette to twist, or some part of the scene to be materialized. These intentions can be viewed as the modification of the cross attention from the text prompt onto UNet, during the desoising diffusion. This work presents AttnMod, to modify attention for creating new unpromptable art styles out of existing diffusion models. The style-creating behavior is studied across different setups.

Problem

Research questions and friction points this paper is trying to address.

Modulates cross-attention to create new art styles

Alters text prompt conditioning during image denoising

Enables stylistic transformations without retraining models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free attention modulation technique

Alters text prompt conditioning via attention

Enables diverse stylistic transformations without retraining

🔎 Similar Papers

Have Large Vision-Language Models Mastered Art History?