🤖 AI Summary
This work addresses the fragmentation between information and scientific visualization in existing tools, as well as the unreliability and risk of data fabrication inherent in large language model (LLM)-based generation approaches. To overcome these limitations, the authors propose RaivenDSL, a formal domain-specific language that unifies the expression of multidimensional data visualizations. RaivenDSL leverages metadata constraints to guide LLMs in producing verifiable, compact specifications, which are then deterministically compiled into precise D3 or VTK.js code. Evaluated on a benchmark of 100 tasks, the approach achieves a 100% compilation success rate, delivers a sixfold speedup, and reduces computational costs to one-sixth of baseline methods. User studies further demonstrate that RaivenDSL significantly alleviates debugging effort and enhances the correctness of generated visualizations.
📝 Abstract
Visualization is central to scientific discovery, yet authoring tools remain split between information and scientific visualization, and expertise in one rarely transfers to the other. Large Language Model (LLM) based systems promise to bridge this gap through natural language, but current approaches generate code non-deterministically, with no guarantee of correctness and no protection against silent data fabrication. We present Raiven, a conversational system that mediates visualization authoring through a formally defined domain-specific language. RaivenDSL unifies scientific and information visualization in a single representation spanning 2D, 3D, and tabular data. The LLM produces a compact RaivenDSL specification under schema-guided constraints, and a deterministic compiler translates it to executable D3 or VTK.js code. Because the LLM operates only on dataset metadata, outputs are deterministic, specifications are verifiable before execution, and data fabrication is impossible by construction. In a 100-task benchmark, Raiven achieves 100% compilation, is up to six times faster and six times cheaper than state-of-the-art LLMs, while improving interaction quality, correctness, and data faithfulness. An expert user study shows that Raiven significantly reduces debugging effort and makes it easier to produce correct visualizations.