DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios

📅 2026-04-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
Current evaluations of data visualization agents are largely confined to code sandboxes, single-language generation, and idealized user intents, failing to capture the complexity of real-world professional workflows. To address this gap, this work proposes DV-World—the first benchmark grounded in authentic visualization workflows—comprising 260 tasks that center on three core challenges: native spreadsheet operations, cross-platform visualization evolution, and proactive intent alignment. We introduce a hybrid evaluation framework that combines Table-value Alignment to ensure numerical fidelity and employs a multimodal large language model (MLLM-as-a-Judge) for joint semantic–visual assessment. Experimental results reveal that even state-of-the-art models achieve less than 50% overall performance, underscoring their significant limitations in realistic visualization scenarios and establishing DV-World as a robust platform for future research.
📝 Abstract
Real-world data visualization (DV) requires native environmental grounding, cross-platform evolution, and proactive intent alignment. Yet, existing benchmarks often suffer from code-sandbox confinement, single-language creation-only tasks, and assumption of perfect intent. To bridge these gaps, we introduce DV-World, a benchmark of 260 tasks designed to evaluate DV agents across real-world professional lifecycles. DV-World spans three domains: DV-Sheet for native spreadsheet manipulation including chart and dashboard creation as well as diagnostic repair; DV-Evolution for adapting and restructuring reference visual artifacts to fit new data across diverse programming paradigms and DV-Interact for proactive intent alignment with a user simulator that mimics real-world ambiguous requirements. Our hybrid evaluation framework integrates Table-value Alignment for numerical precision and MLLM-as-a-Judge with rubrics for semantic-visual assessment. Experiments reveal that state-of-the-art models achieve less than 50% overall performance, exposing critical deficits in handling the complex challenges of real-world data visualization. DV-World provides a realistic testbed to steer development toward the versatile expertise required in enterprise workflows. Our data and code are available at \href{https://github.com/DA-Open/DV-World}{this project page}.
Problem

Research questions and friction points this paper is trying to address.

data visualization
real-world scenarios
benchmarking
intent alignment
cross-platform evolution
Innovation

Methods, ideas, or system contributions that make the work stand out.

data visualization agents
real-world benchmarking
intent alignment
cross-platform evolution
hybrid evaluation framework
🔎 Similar Papers
Jinxiang Meng
Jinxiang Meng
Nanjing University of Posts and Telecommunications
LLM AgentTable ReasoningTool Use
S
Shaoping Huang
Institute of Automation, Chinese Academy of Sciences
Fangyu Lei
Fangyu Lei
Institute of Automation, Chinese Academy of Sciences
LLM-AgentCode GenerationText-to-SQLTable Reasoning
Jingyu Guo
Jingyu Guo
Auransa
Genetics and GenomicsTranslational research with model system (Yeast)Bioinformatics
H
Haoxiang Liu
Institute of Automation, Chinese Academy of Sciences
J
Jiahao Su
Institute of Automation, Chinese Academy of Sciences
Sihan Wang
Sihan Wang
Michigan State University
Wireless NetworkIoTNetwork Security
Y
Yao Wang
University of Chinese Academy of Sciences
E
Enrui Wang
Institute of Automation, Chinese Academy of Sciences
Y
Ye Yang
Institute of Automation, Chinese Academy of Sciences
H
Hongze Chai
Institute of Automation, Chinese Academy of Sciences
J
Jinming Lv
Institute of Automation, Chinese Academy of Sciences
A
Anbang Yu
Institute of Automation, Chinese Academy of Sciences
H
Huangjing Zhang
Institute of Automation, Chinese Academy of Sciences
Y
Yitong Zhang
National University of Singapore
Y
Yiming Huang
Institute of Automation, Chinese Academy of Sciences
Zeyao Ma
Zeyao Ma
Renmin University of China
Large Language ModelCode GenerationReasoningTable Processing
S
Shizhu He
Institute of Automation, Chinese Academy of Sciences
J
Jun Zhao
Institute of Automation, Chinese Academy of Sciences
K
Kang Liu
Institute of Automation, Chinese Academy of Sciences