🤖 AI Summary
This work addresses the prevalent geometric layout errors—such as misaligned connectors, text overflow, and canvas boundary violations—that render SVG charts generated by large language models unusable. To mitigate these issues, the authors propose a geometry-aware reinforcement learning framework featuring a two-stage generation process: first producing a structured layout plan, then refining the SVG code using executable geometric feedback from a browser-based renderer. The approach introduces a novel six-dimensional fine-grained geometric reward signal and a Group Relative Policy Optimization (GRPO) algorithm to enable relative quality learning across multiple candidate outputs. Experimental results demonstrate that the method significantly outperforms state-of-the-art baselines in metrics including arrow anchor precision and text-in-bounding-box ratio, achieving superior performance in both local geometric accuracy and graph structural connectivity.
📝 Abstract
Generating structured, editable diagrams remains a significant challenge for contemporary large language models, despite their proficiency in general-purpose vector code generation. The primary difficulty lies in the structural fragility of the output; minor errors such as misaligned connector endpoints, text labels overlapping borders, or complex layouts drifting beyond the canvas boundaries render the resulting SVG files functionally unusable for professional applications. To address these issues, we introduce GeoSVG-RL, a specialized reinforcement learning framework designed for layout-constrained text-to-SVG generation. Unlike standard training objectives that rely solely on maximizing token-level likelihood, our approach optimizes the policy against explicit, executable geometric feedback. The model first produces a structured layout plan that serves as a geometric contract for the subsequent generation of the SVG code. This code is then rendered through a browser-backed verifier, enabling the calculation of fine-grained rewards across six critical dimensions: rendering validity, canvas fitting, precise anchor placement, text containment, graph consistency, and code cleanliness. We utilize Group Relative Policy Optimization (GRPO) to refine the model, sampling multiple candidates per prompt to facilitate updates based on relative quality. Starting from a supervised warm-start phase on synthetic data, GeoSVG-RL achieves substantial gains in structural reliability, particularly in arrow-anchor accuracy and text-in-box rates. Quantitative evaluations demonstrate that our method consistently outperforms current state-of-the-art systems in local geometric precision and the preservation of graph connectivity, providing a robust pathway toward automated yet reliable technical illustration.