🤖 AI Summary
Existing chart understanding benchmarks primarily focus on static visualizations, making them inadequate for evaluating models’ comprehension of real-world interactive, dynamic charts. This work presents the first systematic benchmark specifically designed to assess understanding of dynamic charts that support user interactions such as hovering and clicking. The benchmark distinguishes between two contextual categories—Dynamic Charts and Dashboard Charts—and comprises 1,440 high-quality question-answer pairs derived from 673 authentic interactive visualizations. Comprehensive evaluation using state-of-the-art multimodal models and GUI agents reveals a significant performance gap: while the best-performing model (Claude-Opus-4.7) achieves an average success rate of 84.5%, most models score below 60%, underscoring the substantial limitations of current approaches in understanding dynamic, interactive charts.
📝 Abstract
Charts are widely used to present complex data for analysis and decision making. Existing chart understanding benchmarks mainly focus on static charts, but real-world charts are often dynamic and interactive. Key information may only appear after actions such as hovering, clicking, zooming, or dragging. Dynamic chart understanding therefore requires models to identify visible content, choose proper interactions, and reason over changing chart states. To evaluate this ability, we propose ChartAct, an interactive benchmark for dynamic chart understanding. ChartAct collects and filters 673 dynamic charts from 8 real chart websites, covers 7 common chart types, and constructs 1,440 high-quality question-answer samples. Each sample is instantiated in two environments, Dynamic Chart and Dashboard Chart, to evaluate dynamic chart understanding under different contexts. Based on ChartAct, we systematically evaluate 11 advanced multimodal models and GUI agents. Experimental results show that existing models still have clear limitations in dynamic chart understanding. The strongest model, Claude-Opus-4.7, achieves an average success rate of 84.5\%, while most models remain below 60\%. We also conduct detailed failure attribution and case analysis. ChartAct provides a new benchmark for studying chart understanding in real interactive environments. Codes at https://github.com/wulin-wulin/OSWorld_Chart