Chart-to-Experience: Benchmarking Multimodal LLMs for Predicting Experiential Impact of Charts

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Existing multimodal large language models (MLLMs) lack systematic evaluation for chart perception and affective impact prediction, risking overgeneralization. Method: We introduce the first benchmark dataset for evaluating “chart experience impact,” comprising 36 diverse charts annotated by crowdsourcing across seven human perceptual and affective dimensions, supporting both single-chart prediction and chart-pair comparison tasks. We formally define and quantify “experience impact” and propose a novel multidimensional evaluation framework integrating perceptual and affective signals. Contribution/Results: Evaluating state-of-the-art MLLMs—including LLaVA, Qwen-VL, and Gemini—via zero-shot and few-shot prompting, we achieve >85% accuracy on chart-pair comparison, approaching human-level performance; however, single-chart prediction remains significantly weaker, revealing fundamental limitations in deep reasoning. This work establishes a new benchmark, paradigm, and conceptual foundation for intelligent chart understanding.

Technology Category

Application Category

📝 Abstract

The field of Multimodal Large Language Models (MLLMs) has made remarkable progress in visual understanding tasks, presenting a vast opportunity to predict the perceptual and emotional impact of charts. However, it also raises concerns, as many applications of LLMs are based on overgeneralized assumptions from a few examples, lacking sufficient validation of their performance and effectiveness. We introduce Chart-to-Experience, a benchmark dataset comprising 36 charts, evaluated by crowdsourced workers for their impact on seven experiential factors. Using the dataset as ground truth, we evaluated capabilities of state-of-the-art MLLMs on two tasks: direct prediction and pairwise comparison of charts. Our findings imply that MLLMs are not as sensitive as human evaluators when assessing individual charts, but are accurate and reliable in pairwise comparisons.

Problem

Research questions and friction points this paper is trying to address.

Benchmarking MLLMs for predicting chart emotional impact

Validating MLLM performance on experiential chart evaluation

Assessing MLLM sensitivity versus humans in chart comparisons

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark dataset for chart experiential impact

Evaluates MLLMs on direct prediction tasks

Assesses MLLMs on pairwise comparison accuracy

🔎 Similar Papers

CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models