VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation

📅 2026-05-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

220K/year
🤖 AI Summary
Existing methods for text-to-SVG animation generation struggle to simultaneously preserve topological consistency and model non-rigid deformations, while also lacking support for open-domain instructions. This work proposes VAnim—the first large language model framework tailored for open-domain, text-driven SVG animation generation—which formulates animation as sparse state updates over a persistent SVG DOM tree, drastically reducing sequence length by more than 9.8×. By integrating an identity-aware motion planning mechanism and rendering-aware reinforcement learning (GRPO) with a video-perceptual hybrid reward, VAnim achieves high-fidelity dynamic generation while maintaining structural validity and identity consistency. We further introduce SVGAnim-134k, the first vector animation benchmark, and demonstrate through experiments that VAnim significantly outperforms existing approaches in semantic alignment, motion quality, and structural preservation.
📝 Abstract
Scalable Vector Graphics (SVG) animation generation is pivotal for professional design due to their structural editability and resolution independence. However, this task remains challenging as it requires bridging discrete code representations with continuous visual dynamics. Existing optimization-based methods often destroy topological consistency, while general-purpose LLMs rely on rigid CSS/SMIL transformations, failing to model geometry-level non-rigid deformations. To address these limitations, we present VAnim, the first LLM-based framework for open-domain text-to-SVG animation. We reconceptualize animation not as sequence generation, but as Sparse State Updates (SSU) on a persistent SVG DOM tree. This paradigm compresses sequence length by over 9.8x while preserving the SVG DOM structure and non-participating elements by construction. To enable precise control, we propose an Identification-First Motion Planning mechanism that grounds textual instructions in explicit visual entities. Furthermore, to overcome the non-differentiable nature of SVG rendering, we employ Rendering-Aware Reinforcement Learning via Group Relative Policy Optimization (GRPO). By leveraging a hybrid reward from a state-of-the-art video perception encoder, we align discrete code updates with high-fidelity visual feedback. We also introduce SVGAnim-134k, the first benchmark for vector animation. Extensive experiments demonstrate that VAnim significantly outperforms state-of-the-art baselines in semantic alignment and structural validity, with additional appendix metrics further validating motion quality and identity preservation.
Problem

Research questions and friction points this paper is trying to address.

SVG animation
structural preservation
non-rigid deformation
text-to-animation
vector graphics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse State Updates
Rendering-Aware Reinforcement Learning
SVG Animation
DOM-Preserving Modeling
Group Relative Policy Optimization
🔎 Similar Papers
No similar papers found.
G
Guotao Liang
School of Software, Beihang University, Beijing, China
Z
Zhangcheng Wang
4Paradigm
C
Chuang Wang
School of Software, Beihang University, Beijing, China
J
Juncheng Hu
School of Software, Beihang University, Beijing, China
H
Haitao Zhou
School of Software, Beihang University, Beijing, China
Junhua Liu
Junhua Liu
University of Southern California
Multimedia SystemsVR/AR/XRAI/ML Systems
Jing Zhang
Jing Zhang
School of Software, Beihang University
Computer VisionTransfer LearningDeep Learning
Dong Xu
Dong Xu
Master of Computer Science, Fudan University
Long Context ModelRAGHallucination
Qian Yu
Qian Yu
Professor, Dept of Earth, Geographic, and Climate Sciences, University of Massachusetts-Amherst
GISremote sensingSpatial modeling