🤖 AI Summary
This work addresses the limitations of existing affective image editing methods, which suffer from low efficiency and reliance on discrete emotion labels that fail to capture nuanced, continuous emotional states. To overcome these challenges, the authors propose MooD, a novel framework that enables fine-grained and efficient affective image editing grounded in the continuous valence-arousal (VA) space. Key contributions include the construction of AffectSet—the first large-scale dataset annotated with VA coordinates—alongside a VA-aware semantic retrieval strategy and a hybrid mechanism integrating visual adaptation with semantic-guided generation for controllable editing. Experimental results demonstrate that MooD significantly outperforms current approaches in terms of affective controllability, visual fidelity, and inference efficiency.
📝 Abstract
Affective image editing (AIE) aims to edit visual content to evoke target emotions. However, existing methods often overlook inference efficiency and predominantly depend on discrete emotion representations, which to some extent limits their practical applicability and makes it challenging to capture complex and subtle human emotions. To tackle these gaps, we propose MooD, the first framework that directly leverages continuous Valence-Arousal (VA) values for fine-grained and efficient AIE. Specifically, we first introduce a VA-Aware retrieval strategy to bridge vague affective values and concrete visual semantics. Building upon this, MooD integrates visual transfer and semantic guidance to achieve controllable AIE. Furthermore, we construct AffectSet, a VA-annotated dataset to support model optimization and evaluation. Extensive qualitative and quantitative experimental results demonstrate that our MooD achieves superior performance in both affective controllability and visual fidelity while maintaining high efficiency. A series of ablation studies further reveal the crucial factors of our design. Our code and data will be made publicly open soon.