🤖 AI Summary
Existing image/text-to-SVG generation methods suffer from weak semantic understanding, artifact-prone outputs, and limited support for non-path SVG elements (e.g., polygons, gradients). To address these limitations, we propose the first end-to-end vision-language large model framework for generating syntactically valid, high-fidelity, infinitely scalable SVG code. Our method employs a multimodal Transformer architecture that jointly optimizes vision-language alignment pretraining and grammar-constrained autoregressive decoding—enabling full coverage of SVG primitives while guaranteeing syntactic correctness and visual fidelity. Evaluated on multiple benchmarks, our approach significantly outperforms state-of-the-art methods in both quantitative metrics and qualitative robustness. It supports real-time interactive generation and dynamic web rendering, and we publicly release a fully functional system implementation.
📝 Abstract
Scalable Vector Graphics (SVG) have become integral to modern image rendering applications due to their infinite scalability and versatility, especially in graphic design and web development. SVGs are essentially long strings of code that adhere to a structured syntax with validity constraints. With the rise of large language models, which excel at generating code in various languages, we aim to generate SVG code in a similar way. Our findings show that a vision-language model can be conditioned to produce valid SVG code that closely resembles input images, effectively enabling vectorization. Additionally, we harness the rich SVG syntax, encompassing all possible primitives—such as lines, paths, polygons, text, and effects like color gradients—that previous methods often missed. We briefly explain how the StarVector model operates, primarily leveraging a vision-language transformer architecture to generate SVG code. We also detail our training and inference procedures. Finally, we provide an interactive demo that allows users to input an image and generate its SVG code autoregressively, featuring real-time rendering that visually demonstrates the SVG generation process.