🤖 AI Summary
Text-to-chart generation suffers from high code execution failure rates (~15%), semantic hallucinations, and poor color accessibility for users with color vision deficiency. Method: This paper proposes the first lightweight multi-agent framework for text-to-chart generation, decoupling the process into four stages—draft generation, execution validation, error repair, and quality assessment—using only the GPT-4o-mini model. Contribution/Results: We identify single-prompt design—not model capability—as the primary bottleneck for execution failures. Our work pioneers multi-agent collaboration for this task and extends evaluation beyond executability to include aesthetics, semantic fidelity, and accessibility. On Text2Chart31 and ChartX benchmarks, execution error rates drop to 4.5% and 4.6%, respectively. Human evaluation reveals that only 33.3% and 7.2% of generated charts are color-vision-friendly, underscoring the necessity of our expanded evaluation criteria.
📝 Abstract
Large language models can translate natural-language chart descriptions into runnable code, yet approximately 15% of the generated scripts still fail to execute, even after supervised fine-tuning and reinforcement learning. We investigate whether this persistent error rate stems from model limitations or from reliance on a single-prompt design. To explore this, we propose a lightweight multi-agent pipeline that separates drafting, execution, repair, and judgment, using only an off-the-shelf GPT-4o-mini model. On the extsc{Text2Chart31} benchmark, our system reduces execution errors to 4.5% within three repair iterations, outperforming the strongest fine-tuned baseline by nearly 5 percentage points while requiring significantly less compute. Similar performance is observed on the extsc{ChartX} benchmark, with an error rate of 4.6%, demonstrating strong generalization. Under current benchmarks, execution success appears largely solved. However, manual review reveals that 6 out of 100 sampled charts contain hallucinations, and an LLM-based accessibility audit shows that only 33.3% ( extsc{Text2Chart31}) and 7.2% ( extsc{ChartX}) of generated charts satisfy basic colorblindness guidelines. These findings suggest that future work should shift focus from execution reliability toward improving chart aesthetics, semantic fidelity, and accessibility.