OmniDiagram: Advancing Unified Diagram Code Generation via Visual Interrogation Reward

πŸ“… 2026-04-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing approaches to chart code generation are constrained by task-specific formulations and programming languages, limiting their ability to support diverse chart types and structured visualization requirements. This work proposes OmniDiagram, a unified framework that integrates multilingual chart code and task definitions, and introduces the novel Visual Interrogation Verifies All (Viva) mechanism. Viva replaces conventional syntactic or pixel-level matching with generative visual interrogation to provide fine-grained visual fidelity feedback, enabling self-evolving training without human annotations. Leveraging Viva, the framework combines supervised fine-tuning and reinforcement learning to optimize structural alignment of rendered outputs. Experiments demonstrate that OmniDiagram achieves new state-of-the-art performance across multiple benchmarks and introduces M3Β²Diagram, the first large-scale chart code dataset, substantially improving generation quality in cross-chart-type and cross-language scenarios.
πŸ“ Abstract
The paradigm of programmable diagram generation is evolving rapidly, playing a crucial role in structured visualization. However, most existing studies are confined to a narrow range of task formulations and language support, constraining their applicability to diverse diagram types. In this work, we propose OmniDiagram, a unified framework that incorporates diverse diagram code languages and task definitions. To address the challenge of aligning code logic with visual fidelity in Reinforcement Learning (RL), we introduce a novel visual feedback strategy named Visual Interrogation Verifies All (\textsc{Viva}). Unlike brittle syntax-based rules or pixel-level matching, \textsc{Viva} rewards the visual structure of rendered diagrams through a generative approach. Specifically, \textsc{Viva} actively generates targeted visual inquiries to scrutinize diagram visual fidelity and provides fine-grained feedback for optimization. This mechanism facilitates a self-evolving training process, effectively obviating the need for manually annotated ground truth code. Furthermore, we construct M3$^2$Diagram, the first large-scale diagram code generation dataset, containing over 196k high-quality instances. Experimental results confirm that the combination of SFT and our \textsc{Viva}-based RL allows OmniDiagram to establish a new state-of-the-art (SOTA) across diagram code generation benchmarks.
Problem

Research questions and friction points this paper is trying to address.

diagram code generation
visual fidelity
unified framework
reinforcement learning
programmable diagram
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Diagram Code Generation
Visual Interrogation Reward
Reinforcement Learning
Self-evolving Training
Large-scale Dataset
πŸ”Ž Similar Papers
No similar papers found.
H
Haoyue Yang
Institute of Automation, Chinese Academy of Sciences
X
Xuanle Zhao
Institute of Automation, Chinese Academy of Sciences
X
Xuexin Liu
Institute of Automation, Chinese Academy of Sciences
F
Feibang Jiang
University of Chinese Academy of Sciences
Yao Zhu
Yao Zhu
Zhejiang University
Robust machine learning