🤖 AI Summary
Despite growing adoption of generative AI for data visualization design assistance, the underlying reasons for its superior performance over human experts remain unclear.
Method: We systematically compare ChatGPT-3.5, ChatGPT-4, and professional human designers across three dimensions—rhetorical structure, knowledge breadth, and perceptual quality—using a multidimensional evaluation framework grounded in real-world visualization design tasks.
Contribution/Results: ChatGPT-4 uniquely integrates human-inspired heuristics with systematic logical reasoning inherited from earlier models, yielding significantly higher feedback coverage and task orientation. It outperforms human designers in technical completeness, cross-chart-type knowledge transfer, and provision of actionable recommendations. This study provides the first empirical evidence that large language models possess not only broad and deep visualization knowledge but also structured expressive capability—enabling coherent, context-aware knowledge transmission. Our findings establish a foundational empirical basis for AI-augmented design paradigms and introduce a novel, multi-dimensional assessment methodology for evaluating AI-generated design feedback.
📝 Abstract
This paper investigates why recent generative AI models outperform humans in data visualization knowledge tasks. Through systematic comparative analysis of responses to visualization questions, we find that differences exist between two ChatGPT models and human outputs over rhetorical structure, knowledge breadth, and perceptual quality. Our findings reveal that ChatGPT-4, as a more advanced model, displays a hybrid of characteristics from both humans and ChatGPT-3.5. The two models were generally favored over human responses, while their strengths in coverage and breadth, and emphasis on technical and task-oriented visualization feedback collectively shaped higher overall quality. Based on our findings, we draw implications for advancing user experiences based on the potential of LLMs and human perception over their capabilities, with relevance to broader applications of AI.