ChartE$^{3}$: A Comprehensive Benchmark for End-to-End Chart Editing

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the first end-to-end chart editing evaluation framework that operates without intermediate representations such as natural language or code, addressing the challenge of simultaneously preserving global structural consistency and enabling fine-grained editing. The framework defines two core tasks: local appearance adjustments and global data-driven transformations, and introduces a high-quality, human-verified multimodal triplet dataset comprising chart images, corresponding generation code, and multimodal instructions. This setup facilitates both objective and subjective evaluation. Comprehensive benchmarking of state-of-the-art multimodal large language models reveals significant performance bottlenecks in global editing tasks, highlighting a critical limitation in current end-to-end chart editing capabilities.

Technology Category

Application Category

📝 Abstract
Charts are a fundamental visualization format for structured data analysis. Enabling end-to-end chart editing according to user intent is of great practical value, yet remains challenging due to the need for both fine-grained control and global structural consistency. Most existing approaches adopt pipeline-based designs, where natural language or code serves as an intermediate representation, limiting their ability to faithfully execute complex edits. We introduce ChartE$^{3}$, an End-to-End Chart Editing benchmark that directly evaluates models without relying on intermediate natural language programs or code-level supervision. ChartE$^{3}$ focuses on two complementary editing dimensions: local editing, which involves fine-grained appearance changes such as font or color adjustments, and global editing, which requires holistic, data-centric transformations including data filtering and trend line addition. ChartE$^{3}$ contains over 1,200 high-quality samples constructed via a well-designed data pipeline with human curation. Each sample is provided as a triplet of a chart image, its underlying code, and a multimodal editing instruction, enabling evaluation from both objective and subjective perspectives. Extensive benchmarking of state-of-the-art multimodal large language models reveals substantial performance gaps, particularly on global editing tasks, highlighting critical limitations in current end-to-end chart editing capabilities.
Problem

Research questions and friction points this paper is trying to address.

end-to-end chart editing
chart visualization
multimodal instruction
global structural consistency
fine-grained control
Innovation

Methods, ideas, or system contributions that make the work stand out.

end-to-end chart editing
multimodal benchmark
global structural consistency
local appearance editing
code-free evaluation