🤖 AI Summary
Scientific publications frequently contain figures that violate visualization best practices, risking misrepresentation or misinterpretation of data. To address this, we propose the first automated chart compliance assessment framework based on large vision-language models (VLMs), systematically evaluating five open-source VLMs—including Qwen2.5VL—on critical issues such as chart-type classification, 3D distortion detection, legend omission, and missing axis labels. We design visualization-rule-aware prompting strategies and employ multi-dimensional quantitative evaluation using F1-score and RMSE. Our framework achieves strong performance: chart-type identification (F1 = 82.49%), 3D effect detection (F1 = 98.55%), and legend presence detection (F1 = 96.64%). This work constitutes the first systematic validation of VLMs for scientific figure quality auditing, establishing a reproducible methodological foundation for automated scientific image governance.
📝 Abstract
Diagrams are widely used to visualize data in publications. The research field of data visualization deals with defining principles and guidelines for the creation and use of these diagrams, which are often not known or adhered to by researchers, leading to misinformation caused by providing inaccurate or incomplete information.
In this work, large Vision Language Models (VLMs) are used to analyze diagrams in order to identify potential problems in regards to selected data visualization principles and guidelines. To determine the suitability of VLMs for these tasks, five open source VLMs and five prompting strategies are compared using a set of questions derived from selected data visualization guidelines.
The results show that the employed VLMs work well to accurately analyze diagram types (F1-score 82.49 %), 3D effects (F1-score 98.55 %), axes labels (F1-score 76.74 %), lines (RMSE 1.16), colors (RMSE 1.60) and legends (F1-score 96.64 %, RMSE 0.70), while they cannot reliably provide feedback about the image quality (F1-score 0.74 %) and tick marks/labels (F1-score 46.13 %). Among the employed VLMs, Qwen2.5VL performs best, and the summarizing prompting strategy performs best for most of the experimental questions.
It is shown that VLMs can be used to automatically identify a number of potential issues in diagrams, such as missing axes labels, missing legends, and unnecessary 3D effects. The approach laid out in this work can be extended for further aspects of data visualization.