Write, Rank, or Rate: Comparing Methods for Studying Visualization Affordances

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of efficiently evaluating affordances in information visualization—i.e., how design choices systematically shape users’ interpretive conclusions—without relying on labor-intensive, low-reliability free-text analysis. Through human-subject experiments and crowdsourcing, we comparatively evaluate four elicitation methods—free response, chart ranking, conclusion ranking, and salience rating—across line charts, dot plots, and heatmaps. We further pioneer the use of GPT-4o as a proxy for human responses. Results show that combining ranking with salience rating achieves high fidelity relative to free response; GPT-4o excels in salience rating (r > 0.9) but exhibits limited generalization in ranking tasks; and all methods entail systematic biases and trade-offs. We propose a lightweight, multi-method paradigm that enables scalable, reproducible affordance assessment at scale, while empirically validating the viability of large language models as reliable surrogates for specific perceptual evaluation tasks.

Technology Category

Application Category

📝 Abstract
A growing body of work on visualization affordances highlights how specific design choices shape reader takeaways from information visualizations. However, mapping the relationship between design choices and reader conclusions often requires labor-intensive crowdsourced studies, generating large corpora of free-response text for analysis. To address this challenge, we explored alternative scalable research methodologies to assess chart affordances. We test four elicitation methods from human-subject studies: free response, visualization ranking, conclusion ranking, and salience rating, and compare their effectiveness in eliciting reader interpretations of line charts, dot plots, and heatmaps. Overall, we find that while no method fully replicates affordances observed in free-response conclusions, combinations of ranking and rating methods can serve as an effective proxy at a broad scale. The two ranking methodologies were influenced by participant bias towards certain chart types and the comparison of suggested conclusions. Rating conclusion salience could not capture the specific variations between chart types observed in the other methods. To supplement this work, we present a case study with GPT-4o, exploring the use of large language models (LLMs) to elicit human-like chart interpretations. This aligns with recent academic interest in leveraging LLMs as proxies for human participants to improve data collection and analysis efficiency. GPT-4o performed best as a human proxy for the salience rating methodology but suffered from severe constraints in other areas. Overall, the discrepancies in affordances we found between various elicitation methodologies, including GPT-4o, highlight the importance of intentionally selecting and combining methods and evaluating trade-offs.
Problem

Research questions and friction points this paper is trying to address.

Comparing methods to study visualization affordances efficiently
Evaluating scalable alternatives to labor-intensive crowdsourced studies
Assessing LLMs as proxies for human chart interpretations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tested four elicitation methods for chart affordances
Combined ranking and rating methods as effective proxy
Explored GPT-4o as human proxy for salience rating
🔎 Similar Papers
No similar papers found.