RoboSVG: A Unified Framework for Interactive SVG Generation with Multi-modal Guidance

📅 2025-10-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of generating interactive, scalable vector graphics (SVG) from multimodal inputs—text, images, and numerical signals. We propose RoboSVG, the first unified multimodal conditional SVG generation framework. Methodologically, we construct RoboDraw, a million-scale, high-quality multimodal SVG dataset; design a conditional generative architecture that jointly encodes textual, visual, and numerical inputs; and introduce a dedicated SVG synthesis module coupled with numerical-guided fine-tuning to enable end-to-end vector path modeling and precise editing. Our contributions are threefold: (1) the first framework supporting four cross-modal SVG generation tasks; (2) state-of-the-art performance in visual fidelity, user-intent alignment, and generalization; and (3) outputs that are editable, interactive, and directly applicable as robot motion trajectories—establishing a new paradigm for digital design and embodied intelligence.

Technology Category

Application Category

📝 Abstract
Scalable Vector Graphics (SVGs) are fundamental to digital design and robot control, encoding not only visual structure but also motion paths in interactive drawings. In this work, we introduce RoboSVG, a unified multimodal framework for generating interactive SVGs guided by textual, visual, and numerical signals. Given an input query, the RoboSVG model first produces multimodal guidance, then synthesizes candidate SVGs through dedicated generation modules, and finally refines them under numerical guidance to yield high-quality outputs. To support this framework, we construct RoboDraw, a large-scale dataset of one million examples, each pairing an SVG generation condition (e.g., text, image, and partial SVG) with its corresponding ground-truth SVG code. RoboDraw dataset enables systematic study of four tasks, including basic generation (Text-to-SVG, Image-to-SVG) and interactive generation (PartialSVG-to-SVG, PartialImage-to-SVG). Extensive experiments demonstrate that RoboSVG achieves superior query compliance and visual fidelity across tasks, establishing a new state of the art in versatile SVG generation. The dataset and source code of this project will be publicly available soon.
Problem

Research questions and friction points this paper is trying to address.

Generating interactive SVGs using multi-modal guidance signals
Creating scalable vector graphics from text, image, and numerical inputs
Producing high-quality SVG outputs with query compliance and fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified multimodal framework for interactive SVG generation
Generates SVGs using text, visual, and numerical guidance
Large-scale dataset enables systematic study of SVG tasks
🔎 Similar Papers
No similar papers found.
J
Jiuniu Wang
DAMO Academy, Alibaba Group
G
Gongjie Zhang
DAMO Academy, Alibaba Group
Q
Quanhao Qian
DAMO Academy, Alibaba Group
J
Junlong Gao
DAMO Academy, Alibaba Group
Deli Zhao
Deli Zhao
Alibaba DAMO Academy
generative modelsmultimodal learningfoundation models
R
Ran Xu
DAMO Academy, Alibaba Group