InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
General-purpose SVG modeling has long been hindered by fragmented datasets, poor cross-task transferability, and challenges in capturing structural complexity. Method: We propose InternSVG, a unified multimodal large language model specifically designed for SVGs, introducing SVG-specific special tokens, subword-level embedding initialization, and a two-stage progressive training strategy to jointly model understanding, editing, and generation tasks. We further construct SAgoge—the first comprehensive SVG dataset covering both static graphics and dynamic animations—and SArena, its corresponding benchmark. Contribution/Results: Experiments demonstrate that InternSVG consistently outperforms existing open-source and proprietary methods across SArena and multiple mainstream benchmarks, achieving significant improvements in generalization and cross-task transfer capability. This work establishes a new paradigm for unified representation and automated processing of SVG content.

Technology Category

Application Category

📝 Abstract
General SVG modeling remains challenging due to fragmented datasets, limited transferability of methods across tasks, and the difficulty of handling structural complexity. In response, we leverage the strong transfer and generalization capabilities of multimodal large language models (MLLMs) to achieve unified modeling for SVG understanding, editing, and generation. We present the InternSVG family, an integrated data-benchmark-model suite. At its core is SAgoge, the largest and most comprehensive multimodal dataset for SVG tasks, encompassing both static graphics and dynamic animations. It covers icons, long-sequence illustrations, scientific diagrams, and dynamic animations, supporting tasks of varied difficulty levels and providing deeper hierarchies with richer attributes compared to previous datasets. Based on this resource, we introduce SArena, a companion benchmark with comprehensive task definitions and standardized evaluation that aligns with the domains and difficulty spectrum covered by SAgoge. Building on these foundations, we propose InternSVG, a unified MLLM for SVG understanding, editing, and generation with SVG-specific special tokens, subword-based embedding initialization, and a two-stage training strategy that progresses from short static SVGs to long-sequence illustrations and complex animations. This unified formulation induces positive transfer and improves overall performance. Experiments on SArena and prior benchmark confirm that InternSVG achieves substantial gains and consistently outperforms leading open and proprietary counterparts.
Problem

Research questions and friction points this paper is trying to address.

Unifying SVG understanding, editing, and generation tasks
Addressing fragmented datasets and limited method transferability
Handling structural complexity in static and dynamic SVGs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging MLLMs for unified SVG understanding, editing, generation
Introducing SAgoge, largest multimodal SVG dataset with animations
Proposing InternSVG model with SVG tokens and two-stage training
🔎 Similar Papers
No similar papers found.
Haomin Wang
Haomin Wang
Shanghai AI Laboratory | Shanghai Jiao Tong University
Computer VisionMultimodal Large Language Models
J
Jinhui Yin
Nanjing University
Qi Wei
Qi Wei
Associate Professor of Bioengineering, George Mason University
Biomechanicsmodeling and simulationbiomedical imaging
W
Wenguang Zeng
Donghua University
L
Lixin Gu
Shanghai AI Laboratory
S
Shenglong Ye
Shanghai AI Laboratory
Z
Zhangwei Gao
Shanghai Jiao Tong University
Yaohui Wang
Yaohui Wang
Research Scientist, Shanghai AI Laboratory | Inria
Machine LearningDeep Generative ModelsVideo Generation
Yanting Zhang
Yanting Zhang
Donghua University
Y
Yuanqi Li
Nanjing University
Y
Yanwen Guo
Nanjing University
W
Wenhai Wang
The Chinese University of Hong Kong
K
Kai Chen
Shanghai AI Laboratory
Y
Yu Qiao
Shanghai AI Laboratory
Hongjie Zhang
Hongjie Zhang
Nanjing University; Shanghai Artificial Intelligence Laboratory
Computer Vision