Affective Image Editing: Shaping Emotional Factors via Text Descriptions

📅 2025-05-24
📈 Citations: 0
Influential: 0
📄 PDF

career value

195K/year
🤖 AI Summary
Existing text-driven image editing methods struggle to accurately model and control emotional attributes in images, lacking semantic understanding and continuous representation of users’ affective intent. To address this, we propose the first emotion-instruction-guided image editing framework: (1) a Continuous Emotion Spectrum (CES) to model the emotion space; (2) a learnable Emotional Mapper that enables end-to-end mapping from visual–abstract emotion requests to semantic representations; and (3) MLLM-supervised training coupled with semantics-guided visual element deformation. To support this, we introduce EmoAlign—the first large-scale emotion-aligned image-text dataset. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches across key metrics, including emotion fidelity, instruction adherence, and editing quality, enabling precise and diverse responses to textual emotion instructions.

Technology Category

Application Category

📝 Abstract
In daily life, images as common affective stimuli have widespread applications. Despite significant progress in text-driven image editing, there is limited work focusing on understanding users' emotional requests. In this paper, we introduce AIEdiT for Affective Image Editing using Text descriptions, which evokes specific emotions by adaptively shaping multiple emotional factors across the entire images. To represent universal emotional priors, we build the continuous emotional spectrum and extract nuanced emotional requests. To manipulate emotional factors, we design the emotional mapper to translate visually-abstract emotional requests to visually-concrete semantic representations. To ensure that editing results evoke specific emotions, we introduce an MLLM to supervise the model training. During inference, we strategically distort visual elements and subsequently shape corresponding emotional factors to edit images according to users' instructions. Additionally, we introduce a large-scale dataset that includes the emotion-aligned text and image pair set for training and evaluation. Extensive experiments demonstrate that AIEdiT achieves superior performance, effectively reflecting users' emotional requests.
Problem

Research questions and friction points this paper is trying to address.

Editing images to evoke specific emotions via text descriptions
Translating abstract emotional requests into visual representations
Creating emotion-aligned datasets for training and evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Build continuous emotional spectrum for nuanced requests
Design emotional mapper for abstract-to-concrete translation
Use MLLM supervision for emotion-specific editing
🔎 Similar Papers
Peixuan Zhang
Peixuan Zhang
PhD student at Penn State University
Stochastic optimizationConvex optimizationChance-constrained optimizationMachine Learning
S
Shuchen Weng
Beijing Academy of Artificial Intelligence
C
Chengxuan Zhu
State Key Laboratory for Multimedia Information Processing and National Engineering Research Center of Visual Technology, School of Computer Science, Peking University, China
B
Binghao Tang
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, China
Z
Zijian Jia
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, China
S
Si Li
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, China
Boxin Shi
Boxin Shi
Peking University
Computer VisionComputational Photography