InfoAffect: A Dataset for Affective Analysis of Infographics

📅 2025-11-09

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Infographic sentiment analysis is hindered by the scarcity of high-quality annotated data. To address this, we introduce InfogSentiment, the first large-scale, domain-diverse (six domains) multimodal sentiment dataset comprising 3.5K image–text pairs. We propose a “caption-prior alignment” strategy and a controlled sentiment label generation mechanism constrained by a curated sentiment lexicon to ensure cross-modal semantic consistency. Furthermore, we design a reciprocal rank fusion (RRF)-based ensemble framework integrating five state-of-the-art multimodal large language models, coupled with rigorous preprocessing and quality control protocols. A user study demonstrates a composite affective consistency index (CACI) of 0.986, confirming the dataset’s high reliability and annotation robustness. This work establishes a benchmark resource and a reproducible methodological paradigm for infographic sentiment computation.

Technology Category

Application Category

📝 Abstract

Infographics are widely used to convey complex information, yet their affective dimensions remain underexplored due to the scarcity of data resources. We introduce a 3.5k-sample affect-annotated InfoAffect dataset, which combines textual content with real-world infographics. We first collect the raw data from six domains and aligned them via preprocessing, the accompanied-text-priority method, and three strategies to guarantee the quality and compliance. After that we construct an affect table and use it to constrain annotation. Five state-of-the-art multimodal large language models (MLLMs) then analyze both modalities, and their outputs are fused with Reciprocal Rank Fusion (RRF) algorithm to yield robust affects and confidences. We conducted a user study with two experiments to validate usability and assess InfoAffect dataset using the Composite Affect Consistency Index (CACI), achieving an overall score of 0.986, which indicates high accuracy.

Problem

Research questions and friction points this paper is trying to address.

Analyzes affective dimensions of infographics using multimodal data

Addresses data scarcity in infographic emotion analysis research

Validates dataset quality through user studies and fusion algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal large language models analyze text and graphics

Reciprocal Rank Fusion algorithm combines model outputs

Composite Affect Consistency Index validates dataset accuracy

🔎 Similar Papers

EmoEdit: Evoking Emotions through Image Manipulation