SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation

๐Ÿ“… 2025-02-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing image-to-vector sketching methods rely on iterative optimization, resulting in slow inference (on the order of seconds) and limited practicality. This paper introduces the first diffusion-based modeling framework specifically designed for vector sketch generation, eliminating optimization entirely and directly predicting stroke control points for millisecond-scale inference (<1 second per image). Our key contributions are threefold: (1) ControlSketchโ€”a scalable synthetic data engine that generates high-quality, diverse vector sketch training data; (2) a depth-aware ControlNet architecture that enhances spatial structure controllability; and (3) a hybrid decoder integrating Transformer-based sequence modeling with Score Distillation Sampling (SDS) to improve geometric plausibility and stylistic naturalness of generated vector strokes. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches in both cross-concept generalization and visual fidelity.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent advancements in large vision-language models have enabled highly expressive and diverse vector sketch generation. However, state-of-the-art methods rely on a time-consuming optimization process involving repeated feedback from a pretrained model to determine stroke placement. Consequently, despite producing impressive sketches, these methods are limited in practical applications. In this work, we introduce SwiftSketch, a diffusion model for image-conditioned vector sketch generation that can produce high-quality sketches in less than a second. SwiftSketch operates by progressively denoising stroke control points sampled from a Gaussian distribution. Its transformer-decoder architecture is designed to effectively handle the discrete nature of vector representation and capture the inherent global dependencies between strokes. To train SwiftSketch, we construct a synthetic dataset of image-sketch pairs, addressing the limitations of existing sketch datasets, which are often created by non-artists and lack professional quality. For generating these synthetic sketches, we introduce ControlSketch, a method that enhances SDS-based techniques by incorporating precise spatial control through a depth-aware ControlNet. We demonstrate that SwiftSketch generalizes across diverse concepts, efficiently producing sketches that combine high fidelity with a natural and visually appealing style.
Problem

Research questions and friction points this paper is trying to address.

Efficient image-to-vector sketch generation
Overcoming time-consuming optimization processes
Handling discrete vector representation globally
Innovation

Methods, ideas, or system contributions that make the work stand out.

SwiftSketch diffusion model
Transformer-decoder architecture
ControlSketch spatial control
๐Ÿ”Ž Similar Papers
No similar papers found.