StarVector: Generating Scalable Vector Graphics Code from Images and Text

📅 2023-12-17

🏛️ AAAI Conference on Artificial Intelligence

📈 Citations: 8

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Existing image/text-to-SVG generation methods suffer from weak semantic understanding, artifact-prone outputs, and limited support for non-path SVG elements (e.g., polygons, gradients). To address these limitations, we propose the first end-to-end vision-language large model framework for generating syntactically valid, high-fidelity, infinitely scalable SVG code. Our method employs a multimodal Transformer architecture that jointly optimizes vision-language alignment pretraining and grammar-constrained autoregressive decoding—enabling full coverage of SVG primitives while guaranteeing syntactic correctness and visual fidelity. Evaluated on multiple benchmarks, our approach significantly outperforms state-of-the-art methods in both quantitative metrics and qualitative robustness. It supports real-time interactive generation and dynamic web rendering, and we publicly release a fully functional system implementation.

📝 Abstract

Scalable Vector Graphics (SVG) have become integral to modern image rendering applications due to their infinite scalability and versatility, especially in graphic design and web development. SVGs are essentially long strings of code that adhere to a structured syntax with validity constraints. With the rise of large language models, which excel at generating code in various languages, we aim to generate SVG code in a similar way. Our findings show that a vision-language model can be conditioned to produce valid SVG code that closely resembles input images, effectively enabling vectorization. Additionally, we harness the rich SVG syntax, encompassing all possible primitives—such as lines, paths, polygons, text, and effects like color gradients—that previous methods often missed. We briefly explain how the StarVector model operates, primarily leveraging a vision-language transformer architecture to generate SVG code. We also detail our training and inference procedures. Finally, we provide an interactive demo that allows users to input an image and generate its SVG code autoregressively, featuring real-time rendering that visually demonstrates the SVG generation process.

Problem

Research questions and friction points this paper is trying to address.

Generating scalable vector graphics from images and text

Overcoming semantic and primitive limitations in SVG generation

Addressing evaluation challenges for vector graphics quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal LLM for SVG generation

SVG-Stack dataset with 2M samples

SVG-Bench benchmark for evaluation

🔎 Similar Papers

VectorPainter: Advanced Stylized Vector Graphics Synthesis Using Stroke-Style Priors