🤖 AI Summary
Existing image emotion editing methods struggle to disentangle emotional and content representations, often resulting in weak emotional expression and structural distortion. This work proposes a training-free emotion editing framework that first constructs a multimodal emotion-associated knowledge graph to guide large models in locating emotion-relevant visual cues through chain-of-thought reasoning. It then introduces a latent space disentanglement module to separate emotional attributes from layout features, enabling precise emotion injection while preserving structural integrity. By integrating multimodal knowledge graphs into emotion editing for the first time, the proposed method significantly enhances both emotional fidelity and content-structure consistency, outperforming current state-of-the-art approaches.
📝 Abstract
Existing image emotion editing methods struggle to disentangle emotional cues from latent content representations, often yielding weak emotional expression and distorted visual structures. To bridge this gap, we propose EmoKGEdit, a novel training-free framework for precise and structure-preserving image emotion editing. Specifically, we construct a Multimodal Sentiment Association Knowledge Graph (MSA-KG) to disentangle the intricate relationships among objects, scenes, attributes, visual clues and emotion. MSA-KG explicitly encode the causal chain among object-attribute-emotion, and as external knowledge to support chain of thought reasoning, guiding the multimodal large model to infer plausible emotion-related visual cues and generate coherent instructions. In addition, based on MSA-KG, we design a disentangled structure-emotion editing module that explicitly separates emotional attributes from layout features within the latent space, which ensures that the target emotion is effectively injected while strictly maintaining visual spatial coherence. Extensive experiments demonstrate that EmoKGEdit achieves excellent performance in both emotion fidelity and content preservation, and outperforms the state-of-the-art methods.