OmniDiagram: Advancing Unified Diagram Code Generation via Visual Interrogation Reward

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing approaches to chart code generation are constrained by task-specific formulations and programming languages, limiting their ability to support diverse chart types and structured visualization requirements. This work proposes OmniDiagram, a unified framework that integrates multilingual chart code and task definitions, and introduces the novel Visual Interrogation Verifies All (Viva) mechanism. Viva replaces conventional syntactic or pixel-level matching with generative visual interrogation to provide fine-grained visual fidelity feedback, enabling self-evolving training without human annotations. Leveraging Viva, the framework combines supervised fine-tuning and reinforcement learning to optimize structural alignment of rendered outputs. Experiments demonstrate that OmniDiagram achieves new state-of-the-art performance across multiple benchmarks and introduces M3²Diagram, the first large-scale chart code dataset, substantially improving generation quality in cross-chart-type and cross-language scenarios.

Technology Category

Application Category

📝 Abstract

The paradigm of programmable diagram generation is evolving rapidly, playing a crucial role in structured visualization. However, most existing studies are confined to a narrow range of task formulations and language support, constraining their applicability to diverse diagram types. In this work, we propose OmniDiagram, a unified framework that incorporates diverse diagram code languages and task definitions. To address the challenge of aligning code logic with visual fidelity in Reinforcement Learning (RL), we introduce a novel visual feedback strategy named Visual Interrogation Verifies All (\textsc{Viva}). Unlike brittle syntax-based rules or pixel-level matching, \textsc{Viva} rewards the visual structure of rendered diagrams through a generative approach. Specifically, \textsc{Viva} actively generates targeted visual inquiries to scrutinize diagram visual fidelity and provides fine-grained feedback for optimization. This mechanism facilitates a self-evolving training process, effectively obviating the need for manually annotated ground truth code. Furthermore, we construct M3$^2$Diagram, the first large-scale diagram code generation dataset, containing over 196k high-quality instances. Experimental results confirm that the combination of SFT and our \textsc{Viva}-based RL allows OmniDiagram to establish a new state-of-the-art (SOTA) across diagram code generation benchmarks.

Problem

Research questions and friction points this paper is trying to address.

diagram code generation

visual fidelity

unified framework

reinforcement learning

programmable diagram

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Diagram Code Generation

Visual Interrogation Reward

Reinforcement Learning