An Empirical Study of Bugs in Data Visualization Libraries

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Latent defects in data visualization libraries—such as misleading charts—are frequently overlooked yet critically impair information fidelity and decision reliability. Method: We conduct the first systematic empirical study of 564 widely used visualization libraries, identifying erroneous graphical computation as the predominant root cause; we propose a novel, comprehensive DataViz defect taxonomy covering symptoms, root causes, and triggering paths; develop an eight-step triggering model and two domain-specific test oracles; and empirically evaluate vision-language models (GPT-4V, LLaVA) for defect detection. Contribution/Results: Our evaluation shows limited practical efficacy—detection accuracy ranges only from 29% to 57%. We release the first publicly available DataViz defect dataset and a reusable, open-source testing methodology framework to support future research and tool development.

Technology Category

Application Category

📝 Abstract

Data visualization (DataViz) libraries play a crucial role in presentation, data analysis, and application development, underscoring the importance of their accuracy in transforming data into visual representations. Incorrect visualizations can adversely impact user experience, distort information conveyance, and influence user perception and decision-making processes. Visual bugs in these libraries can be particularly insidious as they may not cause obvious errors like crashes, but instead mislead users of the underlying data graphically, resulting in wrong decision making. Consequently, a good understanding of the unique characteristics of bugs in DataViz libraries is essential for researchers and developers to detect and fix bugs in DataViz libraries. This study presents the first comprehensive analysis of bugs in DataViz libraries, examining 564 bugs collected from five widely-used libraries. Our study systematically analyzes their symptoms and root causes, and provides a detailed taxonomy. We found that incorrect/inaccurate plots are pervasive in DataViz libraries and incorrect graphic computation is the major root cause, which necessitates further automated testing methods for DataViz libraries. Moreover, we identified eight key steps to trigger such bugs and two test oracles specific to DataViz libraries, which may inspire future research in designing effective automated testing techniques. Furthermore, with the recent advancements in Vision Language Models (VLMs), we explored the feasibility of applying these models to detect incorrect/inaccurate plots. The results show that the effectiveness of VLMs in bug detection varies from 29% to 57%, depending on the prompts, and adding more information in prompts does not necessarily increase the effectiveness. More findings can be found in our manuscript.

Problem

Research questions and friction points this paper is trying to address.

Analyzes bugs in DataViz libraries affecting visualization accuracy

Identifies root causes and symptoms of incorrect data plots

Explores Vision Language Models for detecting visualization bugs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive analysis of 564 DataViz library bugs

Identified eight key steps triggering visualization bugs

Explored Vision Language Models for bug detection

🔎 Similar Papers

Intents, Techniques, and Components: a Unified Analysis of Interaction Authoring Tasks in Data Visualization