🤖 AI Summary
Scientific chart editing is fundamentally a structured data transformation task—not a pixel-level image manipulation problem—yet existing generative models (e.g., diffusion and autoregressive models) fail to ensure semantic correctness due to their neglect of graphical syntax constraints. Method: We formally characterize the structural nature of chart editing for the first time and introduce FigEdit, the first large-scale, multi-task benchmark comprising 10 chart types and five progressively complex editing tasks: single-step, multi-step, conversational, vision-guided, and style transfer. Built from real scientific publications, FigEdit integrates traditional metrics (SSIM, PSNR) with human evaluation. Results: Experiments reveal severe performance degradation of state-of-the-art models on FigEdit, empirically validating the intrinsic limitations of pixel-based approaches. This work establishes a theoretical foundation, standardized evaluation protocol, and empirical evidence to advance structure-aware editing models that jointly preserve visual fidelity and semantic integrity.
📝 Abstract
Generative models, such as diffusion and autoregressive approaches, have demonstrated impressive capabilities in editing natural images. However, applying these tools to scientific charts rests on a flawed assumption: a chart is not merely an arrangement of pixels but a visual representation of structured data governed by a graphical grammar. Consequently, chart editing is not a pixel-manipulation task but a structured transformation problem. To address this fundamental mismatch, we introduce extit{FigEdit}, a large-scale benchmark for scientific figure editing comprising over 30,000 samples. Grounded in real-world data, our benchmark is distinguished by its diversity, covering 10 distinct chart types and a rich vocabulary of complex editing instructions. The benchmark is organized into five distinct and progressively challenging tasks: single edits, multi edits, conversational edits, visual-guidance-based edits, and style transfer. Our evaluation of a range of state-of-the-art models on this benchmark reveals their poor performance on scientific figures, as they consistently fail to handle the underlying structured transformations required for valid edits. Furthermore, our analysis indicates that traditional evaluation metrics (e.g., SSIM, PSNR) have limitations in capturing the semantic correctness of chart edits. Our benchmark demonstrates the profound limitations of pixel-level manipulation and provides a robust foundation for developing and evaluating future structure-aware models. By releasing extit{FigEdit} (https://github.com/adobe-research/figure-editing), we aim to enable systematic progress in structure-aware figure editing, provide a common ground for fair comparison, and encourage future research on models that understand both the visual and semantic layers of scientific charts.