AttriCtrl: Fine-Grained Control of Aesthetic Attribute Intensity in Diffusion Models

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion models struggle to perform fine-grained, continuous, and intensity-controllable editing of image aesthetic attributes (e.g., “brightness”, “refinement”) due to reliance on ambiguous text prompts or costly human preference annotations, limiting scalability. Method: We propose a plug-and-play aesthetic control framework that leverages a pretrained vision-language model to quantify semantic similarity of abstract aesthetics, and introduces a lightweight value encoder that maps [0,1] intensity scalars into differentiable embeddings—seamlessly integrated into text-conditioned diffusion sampling. Contribution/Results: Our method requires no human preference labels, enables independent or joint control over multiple attributes, supports continuous cross-intensity editing, and is compatible with mainstream open-source generators (e.g., Stable Diffusion). Experiments demonstrate significant improvements over baselines in both single-attribute fidelity and multi-attribute coordination, achieving high practicality, flexibility, and scalability.

Technology Category

Application Category

📝 Abstract
Recent breakthroughs in text-to-image diffusion models have significantly enhanced both the visual fidelity and semantic controllability of generated images. However, fine-grained control over aesthetic attributes remains challenging, especially when users require continuous and intensity-specific adjustments. Existing approaches often rely on vague textual prompts, which are inherently ambiguous in expressing both the aesthetic semantics and the desired intensity, or depend on costly human preference data for alignment, limiting their scalability and practicality. To address these limitations, we propose AttriCtrl, a plug-and-play framework for precise and continuous control of aesthetic attributes. Specifically, we quantify abstract aesthetics by leveraging semantic similarity from pre-trained vision-language models, and employ a lightweight value encoder that maps scalar intensities in $[0,1]$ to learnable embeddings within diffusion-based generation. This design enables intuitive and customizable aesthetic manipulation, with minimal training overhead and seamless integration into existing generation pipelines. Extensive experiments demonstrate that AttriCtrl achieves accurate control over individual attributes as well as flexible multi-attribute composition. Moreover, it is fully compatible with popular open-source controllable generation frameworks, showcasing strong integration capability and practical utility across diverse generation scenarios.
Problem

Research questions and friction points this paper is trying to address.

Fine-grained control of aesthetic attributes in diffusion models
Continuous and intensity-specific adjustments for generated images
Overcoming vague textual prompts and costly human preference data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug-and-play framework for aesthetic control
Semantic similarity from vision-language models
Lightweight value encoder for intensity mapping
🔎 Similar Papers
No similar papers found.