MooD: An Efficient VA-Driven Affective Image Editing Framework via Fine-Grained Semantic Control

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

189K/year
🤖 AI Summary
This work addresses the limitations of existing affective image editing methods, which suffer from low efficiency and reliance on discrete emotion labels that fail to capture nuanced, continuous emotional states. To overcome these challenges, the authors propose MooD, a novel framework that enables fine-grained and efficient affective image editing grounded in the continuous valence-arousal (VA) space. Key contributions include the construction of AffectSet—the first large-scale dataset annotated with VA coordinates—alongside a VA-aware semantic retrieval strategy and a hybrid mechanism integrating visual adaptation with semantic-guided generation for controllable editing. Experimental results demonstrate that MooD significantly outperforms current approaches in terms of affective controllability, visual fidelity, and inference efficiency.
📝 Abstract
Affective image editing (AIE) aims to edit visual content to evoke target emotions. However, existing methods often overlook inference efficiency and predominantly depend on discrete emotion representations, which to some extent limits their practical applicability and makes it challenging to capture complex and subtle human emotions. To tackle these gaps, we propose MooD, the first framework that directly leverages continuous Valence-Arousal (VA) values for fine-grained and efficient AIE. Specifically, we first introduce a VA-Aware retrieval strategy to bridge vague affective values and concrete visual semantics. Building upon this, MooD integrates visual transfer and semantic guidance to achieve controllable AIE. Furthermore, we construct AffectSet, a VA-annotated dataset to support model optimization and evaluation. Extensive qualitative and quantitative experimental results demonstrate that our MooD achieves superior performance in both affective controllability and visual fidelity while maintaining high efficiency. A series of ablation studies further reveal the crucial factors of our design. Our code and data will be made publicly open soon.
Problem

Research questions and friction points this paper is trying to address.

Affective Image Editing
Valence-Arousal
Emotion Representation
Inference Efficiency
Fine-Grained Control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Valence-Arousal
affective image editing
fine-grained semantic control
VA-aware retrieval
AffectSet
X
Xinyi Yin
School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450002, China
Yiduo Wang
Yiduo Wang
Postdoc, ACFR, University of Sydney
RoboticsPerception
T
Tingqi Hu
School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450002, China
M
Meicong Si
School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450002, China
Y
Yunyun Shi
School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
S
Shi Chen
School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
H
Hao Wang
School of Journalism and New Media, Xi’an Jiaotong University, Xi’an 710049, China
Junxiao Xue
Junxiao Xue
Zhejiang Lab
Computer GraphicsCrowd simulationMulti-agents ModelingMulti-modal Learning
X
Xuecheng Wu
School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China