Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization

📅 2026-01-08

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work proposes RL-Text2Vis, the first reinforcement learning framework for text-to-visualization generation, addressing key limitations of existing systems—such as insufficient semantic alignment, non-executable code, and poor visual quality—by leveraging execution feedback to holistically improve performance. RL-Text2Vis introduces a novel Group Relative Policy Optimization (GRPO) algorithm and a multi-objective reward mechanism that integrates post-execution feedback to jointly optimize textual fidelity, code executability, and visual quality. Trained end-to-end on Qwen2.5 (7B/14B), the model achieves a 22% improvement over GPT-4o in chart quality on the Text2Vis benchmark and boosts code execution success rates from 78% to 97%. Furthermore, it demonstrates strong generalization capabilities on out-of-domain datasets, including VIS-Eval and NVBench.

Technology Category

Application Category

📝 Abstract

Text-to-Visualization (Text2Vis) systems translate natural language queries over tabular data into concise answers and executable visualizations. While closed-source LLMs generate functional code, the resulting charts often lack semantic alignment and clarity, qualities that can only be assessed post-execution. Open-source models struggle even more, frequently producing non-executable or visually poor outputs. Although supervised fine-tuning can improve code executability, it fails to enhance overall visualization quality, as traditional SFT loss cannot capture post-execution feedback. To address this gap, we propose RL-Text2Vis, the first reinforcement learning framework for Text2Vis generation. Built on Group Relative Policy Optimization (GRPO), our method uses a novel multi-objective reward that jointly optimizes textual accuracy, code validity, and visualization quality using post-execution feedback. By training Qwen2.5 models (7B and 14B), RL-Text2Vis achieves a 22% relative improvement in chart quality over GPT-4o on the Text2Vis benchmark and boosts code execution success from 78% to 97% relative to its zero-shot baseline. Our models significantly outperform strong zero-shot and supervised baselines and also demonstrate robust generalization to out-of-domain datasets like VIS-Eval and NVBench. These results establish GRPO as an effective strategy for structured, multimodal reasoning in visualization generation. We release our code at https://github.com/vis-nlp/RL-Text2Vis.

Problem

Research questions and friction points this paper is trying to address.

Text-to-Visualization

visualization quality

code executability

semantic alignment

post-execution feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning

Text-to-Visualization

Multi-Objective Reward