🤖 AI Summary
Existing affective image generation methods suffer from the “affective shortcut” problem—equating emotion solely with semantic descriptions—yielding outputs lacking authentic emotional expression. To address this, we propose Emotion-Director, a cross-modal collaborative framework comprising MC-Diffusion (a diffusion-based generator) and MC-Agent (a prompt rewriting system). Our core innovation is emotion-visual disentanglement modeling: (i) disentangling emotion from semantics via DPO optimization augmented with negative visual prompts; and (ii) generating subjective, non-semantic emotion prompts through multi-agent chained conceptual reasoning. The method integrates diffusion modeling, cross-modal contrastive learning, and chained conceptual prompting. Extensive experiments on multiple affective benchmarks demonstrate significant improvements over state-of-the-art methods. Quantitative and qualitative evaluations confirm substantial gains in emotional accuracy, diversity, and visual expressiveness.
📝 Abstract
Image generation based on diffusion models has demonstrated impressive capability, motivating exploration into diverse and specialized applications. Owing to the importance of emotion in advertising, emotion-oriented image generation has attracted increasing attention. However, current emotion-oriented methods suffer from an affective shortcut, where emotions are approximated to semantics. As evidenced by two decades of research, emotion is not equivalent to semantics. To this end, we propose Emotion-Director, a cross-modal collaboration framework consisting of two modules. First, we propose a cross-Modal Collaborative diffusion model, abbreviated as MC-Diffusion. MC-Diffusion integrates visual prompts with textual prompts for guidance, enabling the generation of emotion-oriented images beyond semantics. Further, we improve the DPO optimization by a negative visual prompt, enhancing the model's sensitivity to different emotions under the same semantics. Second, we propose MC-Agent, a cross-Modal Collaborative Agent system that rewrites textual prompts to express the intended emotions. To avoid template-like rewrites, MC-Agent employs multi-agents to simulate human subjectivity toward emotions, and adopts a chain-of-concept workflow that improves the visual expressiveness of the rewritten prompts. Extensive qualitative and quantitative experiments demonstrate the superiority of Emotion-Director in emotion-oriented image generation.