🤖 AI Summary
Existing text-driven image editing methods struggle to accurately model and control emotional attributes in images, lacking semantic understanding and continuous representation of users’ affective intent. To address this, we propose the first emotion-instruction-guided image editing framework: (1) a Continuous Emotion Spectrum (CES) to model the emotion space; (2) a learnable Emotional Mapper that enables end-to-end mapping from visual–abstract emotion requests to semantic representations; and (3) MLLM-supervised training coupled with semantics-guided visual element deformation. To support this, we introduce EmoAlign—the first large-scale emotion-aligned image-text dataset. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches across key metrics, including emotion fidelity, instruction adherence, and editing quality, enabling precise and diverse responses to textual emotion instructions.
📝 Abstract
In daily life, images as common affective stimuli have widespread applications. Despite significant progress in text-driven image editing, there is limited work focusing on understanding users' emotional requests. In this paper, we introduce AIEdiT for Affective Image Editing using Text descriptions, which evokes specific emotions by adaptively shaping multiple emotional factors across the entire images. To represent universal emotional priors, we build the continuous emotional spectrum and extract nuanced emotional requests. To manipulate emotional factors, we design the emotional mapper to translate visually-abstract emotional requests to visually-concrete semantic representations. To ensure that editing results evoke specific emotions, we introduce an MLLM to supervise the model training. During inference, we strategically distort visual elements and subsequently shape corresponding emotional factors to edit images according to users' instructions. Additionally, we introduce a large-scale dataset that includes the emotion-aligned text and image pair set for training and evaluation. Extensive experiments demonstrate that AIEdiT achieves superior performance, effectively reflecting users' emotional requests.