🤖 AI Summary
Existing SVG generation methods discretize numerical parameters into tokens, leading to low training efficiency, limited precision, and poor generalization. This work proposes Continuous Numerical Modeling (CNM), a novel approach that, for the first time in SVG generation, directly models numerical parameters as continuous variables, thereby avoiding the information loss and computational redundancy inherent in conventional tokenization. By integrating a multimodal Transformer architecture with a perception-aware reinforcement learning framework, CNM substantially enhances both modeling efficiency and output quality. The method achieves over 30% faster training while producing vector graphics with higher visual fidelity, demonstrating its effectiveness and practicality for high-quality SVG synthesis.
📝 Abstract
For certain image generation tasks, vector graphics such as Scalable Vector Graphics (SVGs) offer clear benefits such as increased flexibility, size efficiency, and editing ease, but remain less explored than raster-based approaches. A core challenge is that the numerical, geometric parameters, which make up a large proportion of SVGs, are inefficiently encoded as long sequences of tokens. This slows training, reduces accuracy, and hurts generalization. To address these problems, we propose Continuous Number Modeling (CNM), an approach that directly models numbers as first-class, continuous values rather than discrete tokens. This formulation restores the mathematical elegance of the representation by aligning the model's inputs with the data's continuous nature, removing discretization artifacts introduced by token-based encoding. We then train a multimodal transformer on 2 million raster-to-SVG samples, followed by fine-tuning via reinforcement learning using perceptual feedback to further improve visual quality. Our approach improves training speed by over 30% while maintaining higher perceptual fidelity compared to alternative approaches. This work establishes CNM as a practical and efficient approach for high-quality vector generation, with potential for broader applications. We make our code available http://github.com/mikeogezi/CNM.