🤖 AI Summary
To address imprecise fine-grained attribute (e.g., color, material) control, structural distortion, and poor global consistency in text-to-image diffusion models, this paper proposes SPAA—a training-free method for object-level attribute editing. SPAA introduces a novel joint editing mechanism that simultaneously manipulates self-attention maps and cross-attention values, enabling precise, instruction-driven attribute modification at the object level. We construct the first comprehensive Attribute Dataset covering all color–material combinations and design an MLLM-powered automated annotation pipeline. SPAA is validated across mainstream T2I diffusion models. Experiments demonstrate that SPAA significantly outperforms existing instruction-based editing methods while preserving object structural integrity and global image coherence, achieving state-of-the-art performance on fine-grained attribute editing tasks.
📝 Abstract
Text-to-image (T2I) diffusion models, renowned for their advanced generative abilities, are extensively utilized in image editing applications, demonstrating remarkable effectiveness. However, achieving precise control over fine-grained attributes still presents considerable challenges. Existing image editing techniques either fail to modify the attributes of an object or struggle to preserve its structure and maintain consistency in other areas of the image. To address these challenges, we propose the Structure-Preserving and Attribute Amplification (SPAA), a training-free method which enables precise control over the color and material transformations of objects by editing the self-attention maps and cross-attention values. Furthermore, we constructed the Attribute Dataset, which encompasses nearly all colors and materials associated with various objects, by integrating multimodal large language models (MLLM) to develop an automated pipeline for data filtering and instruction labeling. Training on this dataset, we present our InstructAttribute, an instruction-based model designed to facilitate fine-grained editing of color and material attributes. Extensive experiments demonstrate that our method achieves superior performance in object-level color and material editing, outperforming existing instruction-based image editing approaches.