š¤ AI Summary
This paper introduces the Affective Image Filtering (AIF) task: transforming abstract emotional semantics from text into concrete, highly evocative visual images. To address this, we construct the first dedicated AIF benchmark dataset; propose AIF-B, a multimodal Transformer-based model; and introduce AIF-Dāa novel framework that uniquely integrates pretrained diffusion models (e.g., Stable Diffusion) as generative priors with a cross-modal affective semantic alignment mechanism. We further establish a dual-objective evaluation metric balancing content fidelity and affective authenticity. Experiments demonstrate that AIF-D achieves statistically significant improvements over state-of-the-art methods across quantitative metrics. User studies confirm its superior ability to elicit target emotions (p < 0.01) and show that its outputs attain the highest scores in both affective credibility and content relevance.
š Abstract
Social media platforms enable users to express emotions by posting text with accompanying images. In this paper, we propose the Affective Image Filter (AIF) task, which aims to reflect visually-abstract emotions from text into visually-concrete images, thereby creating emotionally compelling results. We first introduce the AIF dataset and the formulation of the AIF models. Then, we present AIF-B as an initial attempt based on a multi-modal transformer architecture. After that, we propose AIF-D as an extension of AIF-B towards deeper emotional reflection, effectively leveraging generative priors from pre-trained large-scale diffusion models. Quantitative and qualitative experiments demonstrate that AIF models achieve superior performance for both content consistency and emotional fidelity compared to state-of-the-art methods. Extensive user study experiments demonstrate that AIF models are significantly more effective at evoking specific emotions. Based on the presented results, we comprehensively discuss the value and potential of AIF models.