InfoAffect: A Dataset for Affective Analysis of Infographics

📅 2025-11-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Infographic sentiment analysis is hindered by the scarcity of high-quality annotated data. To address this, we introduce InfogSentiment, the first large-scale, domain-diverse (six domains) multimodal sentiment dataset comprising 3.5K image–text pairs. We propose a “caption-prior alignment” strategy and a controlled sentiment label generation mechanism constrained by a curated sentiment lexicon to ensure cross-modal semantic consistency. Furthermore, we design a reciprocal rank fusion (RRF)-based ensemble framework integrating five state-of-the-art multimodal large language models, coupled with rigorous preprocessing and quality control protocols. A user study demonstrates a composite affective consistency index (CACI) of 0.986, confirming the dataset’s high reliability and annotation robustness. This work establishes a benchmark resource and a reproducible methodological paradigm for infographic sentiment computation.

Technology Category

Application Category

📝 Abstract
Infographics are widely used to convey complex information, yet their affective dimensions remain underexplored due to the scarcity of data resources. We introduce a 3.5k-sample affect-annotated InfoAffect dataset, which combines textual content with real-world infographics. We first collect the raw data from six domains and aligned them via preprocessing, the accompanied-text-priority method, and three strategies to guarantee the quality and compliance. After that we construct an affect table and use it to constrain annotation. Five state-of-the-art multimodal large language models (MLLMs) then analyze both modalities, and their outputs are fused with Reciprocal Rank Fusion (RRF) algorithm to yield robust affects and confidences. We conducted a user study with two experiments to validate usability and assess InfoAffect dataset using the Composite Affect Consistency Index (CACI), achieving an overall score of 0.986, which indicates high accuracy.
Problem

Research questions and friction points this paper is trying to address.

Analyzes affective dimensions of infographics using multimodal data
Addresses data scarcity in infographic emotion analysis research
Validates dataset quality through user studies and fusion algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal large language models analyze text and graphics
Reciprocal Rank Fusion algorithm combines model outputs
Composite Affect Consistency Index validates dataset accuracy
🔎 Similar Papers
Z
Zihang Fu
Zhejiang University of Technology
Y
Yunchao Wang
Zhejiang University of Technology
C
Chenyu Huang
Zhejiang University of Technology
G
Guodao Sun
Zhejiang University of Technology
Ronghua Liang
Ronghua Liang
Zhejiang University of Technology
Medical VisualizationImage ProcessingBig Data-Visualization