VisCoder2: Building Multi-Language Visualization Coding Agents

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses three critical bottlenecks of existing large language model–driven visualization coding agents in practical workflows: narrow programming language coverage, unreliable code execution, and absence of iterative refinement capability. To tackle these, we propose a systematic solution comprising two core components: (1) the construction of VisCode-Multi-679K—a large-scale, multilingual, multi-turn dialogue dataset—and VisPlotBench—a novel executable evaluation benchmark; and (2) the VisCoder2 model family, supporting 12 programming languages, integrating large-scale supervised training, multi-turn self-debugging, and cross-language execution verification. Experimental results show that the 32B VisCoder2 achieves an execution success rate of 82.4%, significantly outperforming open-source baselines and approaching GPT-4.1 performance—particularly excelling in symbol-sensitive or compiler-dependent languages.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have recently enabled coding agents capable of generating, executing, and revising visualization code. However, existing models often fail in practical workflows due to limited language coverage, unreliable execution, and lack of iterative correction mechanisms. Progress has been constrained by narrow datasets and benchmarks that emphasize single-round generation and single-language tasks. To address these challenges, we introduce three complementary resources for advancing visualization coding agents. VisCode-Multi-679K is a large-scale, supervised dataset containing 679K validated and executable visualization samples with multi-turn correction dialogues across 12 programming languages. VisPlotBench is a benchmark for systematic evaluation, featuring executable tasks, rendered outputs, and protocols for both initial generation and multi-round self-debug. Finally, we present VisCoder2, a family of multi-language visualization models trained on VisCode-Multi-679K. Experiments show that VisCoder2 significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4.1, with further gains from iterative self-debug, reaching 82.4% overall execution pass rate at the 32B scale, particularly in symbolic or compiler-dependent languages.
Problem

Research questions and friction points this paper is trying to address.

Building multi-language visualization coding agents
Addressing limited language coverage and unreliable execution
Overcoming narrow datasets and single-round generation constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-language dataset with 679K executable visualization samples
Benchmark for systematic evaluation of generation and debugging
Family of models trained on large-scale multi-language dataset
🔎 Similar Papers
No similar papers found.