🤖 AI Summary
Emotion-driven image editing (AIM) requires joint comprehension of semantics, precise manipulation of visual elements, and rigorous validation of elicited emotional responses—posing challenges for existing methods in balancing accuracy and naturalness. This paper introduces the first multi-agent collaborative framework tailored for AIM, inspired by human painters’ cognitive workflows; it comprises planning, editing, and critique agents operating in a closed-loop system. Key innovations include an emotion-factor knowledge retriever, an interpretable decision-tree-based spatial modeling mechanism, a domain-specific visual editing tool library, and a semantic-aware emotion alignment module. Quantitative evaluation across multiple emotion plausibility and effectiveness metrics demonstrates significant improvements over state-of-the-art methods. The generated images achieve superior fidelity and naturalness in conveying target emotions, validating both functional efficacy and perceptual authenticity.
📝 Abstract
Affective Image Manipulation (AIM) aims to alter an image's emotional impact by adjusting multiple visual elements to evoke specific feelings.Effective AIM is inherently complex, necessitating a collaborative approach that involves identifying semantic cues within source images, manipulating these elements to elicit desired emotional responses, and verifying that the combined adjustments successfully evoke the target emotion.To address these challenges, we introduce EmoAgent, the first multi-agent collaboration framework for AIM. By emulating the cognitive behaviors of a human painter, EmoAgent incorporates three specialized agents responsible for planning, editing, and critical evaluation. Furthermore, we develop an emotion-factor knowledge retriever, a decision-making tree space, and a tool library to enhance EmoAgent's effectiveness in handling AIM. Experiments demonstrate that the proposed multi-agent framework outperforms existing methods, offering more reasonable and effective emotional expression.