Make Me Happier: Evoking Emotions Through Image Diffusion Models

📅 2024-03-13
🏛️ arXiv.org
📈 Citations: 3
Influential: 1
📄 PDF
🤖 AI Summary
This study addresses the underexplored yet critical problem of affective image editing by introducing, for the first time, the task of *emotion-eliciting image generation*: precisely modifying an input image to evoke a target emotion while preserving its semantic content and structural integrity—enabling applications in psychological intervention and commercial design. Methodologically, we construct the first large-scale emotion-annotated image-pair dataset (340K pairs) and propose a novel diffusion-based editing paradigm integrating latent-space emotion guidance, cross-modal emotion embedding, and structural consistency constraints. For evaluation, we design four psychophysical-experiment-driven metrics. Extensive experiments demonstrate that our approach consistently outperforms existing baselines across both objective and subjective evaluations, achieving superior emotion recognition accuracy, fine-grained edit controllability, and structural fidelity.

Technology Category

Application Category

📝 Abstract
Despite the rapid progress in image generation, emotional image editing remains under-explored. The semantics, context, and structure of an image can evoke emotional responses, making emotional image editing techniques valuable for various real-world applications, including treatment of psychological disorders, commercialization of products, and artistic design. For the first time, we present a novel challenge of emotion-evoked image generation, aiming to synthesize images that evoke target emotions while retaining the semantics and structures of the original scenes. To address this challenge, we propose a diffusion model capable of effectively understanding and editing source images to convey desired emotions and sentiments. Moreover, due to the lack of emotion editing datasets, we provide a unique dataset consisting of 340,000 pairs of images and their emotion annotations. Furthermore, we conduct human psychophysics experiments and introduce four new evaluation metrics to systematically benchmark all the methods. Experimental results demonstrate that our method surpasses all competitive baselines. Our diffusion model is capable of identifying emotional cues from original images, editing images that elicit desired emotions, and meanwhile, preserving the semantic structure of the original images. All code, model, and dataset will be made public.
Problem

Research questions and friction points this paper is trying to address.

Emotion-evoked image generation lacks exploration
Editing images to convey desired emotions effectively
Absence of datasets for emotion editing tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Emotion-aware diffusion model for image editing
Large dataset with 340k emotion-annotated image pairs
Novel evaluation metric via human psychophysics experiments
🔎 Similar Papers
No similar papers found.
Q
Qing Lin
I2R and CFAR, Agency for Science, Technology and Research (A*STAR), Singapore; Nanyang Technological University, Singapore
J
Jingfeng Zhang
School of Computer Science, the University of Auckland, New Zealand; RIKEN AIP, Tokyo, Japan
Y
Y. Ong
CFAR, Agency for Science, Technology and Research (A*STAR), Singapore; Nanyang Technological University, Singapore
Mengmi Zhang
Mengmi Zhang
Assistant professor and PI of Deep NeuroCognition Lab, Nanyang Technological University, Singapore
neuroscience-inspired AIcomputer visioncomputational neurosciencecognitive science