🤖 AI Summary
Infographic sentiment analysis is hindered by the scarcity of high-quality annotated data. To address this, we introduce InfogSentiment, the first large-scale, domain-diverse (six domains) multimodal sentiment dataset comprising 3.5K image–text pairs. We propose a “caption-prior alignment” strategy and a controlled sentiment label generation mechanism constrained by a curated sentiment lexicon to ensure cross-modal semantic consistency. Furthermore, we design a reciprocal rank fusion (RRF)-based ensemble framework integrating five state-of-the-art multimodal large language models, coupled with rigorous preprocessing and quality control protocols. A user study demonstrates a composite affective consistency index (CACI) of 0.986, confirming the dataset’s high reliability and annotation robustness. This work establishes a benchmark resource and a reproducible methodological paradigm for infographic sentiment computation.
📝 Abstract
Infographics are widely used to convey complex information, yet their affective dimensions remain underexplored due to the scarcity of data resources. We introduce a 3.5k-sample affect-annotated InfoAffect dataset, which combines textual content with real-world infographics. We first collect the raw data from six domains and aligned them via preprocessing, the accompanied-text-priority method, and three strategies to guarantee the quality and compliance. After that we construct an affect table and use it to constrain annotation. Five state-of-the-art multimodal large language models (MLLMs) then analyze both modalities, and their outputs are fused with Reciprocal Rank Fusion (RRF) algorithm to yield robust affects and confidences. We conducted a user study with two experiments to validate usability and assess InfoAffect dataset using the Composite Affect Consistency Index (CACI), achieving an overall score of 0.986, which indicates high accuracy.