Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models

📅 2024-11-25

🏛️ Computer Vision and Pattern Recognition

📈 Citations: 13

✨ Influential: 3

career value

179K/year

🤖 AI Summary

Existing text-to-SVG generation methods suffer from significant limitations in shape regularity, semantic alignment, and visual fidelity, while professional SVG authoring remains labor-intensive and inaccessible to non-experts. To address these challenges, we propose the first end-to-end framework for generating high-quality SVGs from natural language, comprising two stages: (1) semantic parsing and structured SVG template generation using a large language model (LLM), and (2) latent-space diffusion modeling coupled with geometric coordinate fine-tuning to enforce path regularity and enhance visual fidelity. Our approach supports interactive natural language editing, enabling precise semantic control. Extensive evaluation demonstrates substantial improvements over state-of-the-art methods across quantitative metrics—including structural correctness, geometric consistency, and perceptual quality—while achieving superior semantic accuracy and visual realism. The code and pretrained models are publicly released.

Technology Category

Application Category

📝 Abstract

Scalable Vector Graphics (SVG) has become the de facto standard for vector graphics in digital design, offering resolution independence and precise control over individual elements. Despite their advantages, creating high-quality SVG content remains challenging, as it demands technical expertise with professional editing software and a considerable time investment to craft complex shapes. Recent text-to-SVG generation methods aim to make vector graphics creation more accessible, but they still encounter limitations in shape regularity, generalization ability, and expressiveness. To address these challenges, we introduce Chat2SVG, a hybrid framework that combines the strengths of Large Language Models (LLMs) and image diffusion models for text-to-SVG generation. Our approach first uses an LLM to generate semantically meaningful SVG templates from basic geometric primitives. Guided by image diffusion models, a dual-stage optimization pipeline refines paths in latent space and adjusts point coordinates to enhance geometric complexity. Extensive experiments show that Chat2SVG outperforms existing methods in visual fidelity, path regularity, and semantic alignment. Additionally, our system enables intuitive editing through natural language instructions, making professional vector graphics creation accessible to all users. Our code is available at https://chat2svg.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Generating scalable vector graphics with high visual fidelity

Overcoming limitations in shape regularity and semantic alignment

Making professional vector graphics creation accessible through natural language

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM generates SVG templates from geometric primitives

Dual-stage optimization refines paths and adjusts coordinates

Image diffusion models guide geometric complexity enhancement

🔎 Similar Papers

SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout