CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model

πŸ“… 2025-04-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses the lack of controllability and editability in sketch generation for artistic creation. Methodologically, it proposes a controllable, progressive text-to-sketch generation framework: (1) It introduces the first binary sketch representation based on Unsigned Distance Fields (UDFs), enhancing diffusion model stability for generating sketches with crisp, well-defined edges; (2) It constructs the first large-scale, high-quality text-sketch paired dataset; and (3) It designs a multi-round refinement architecture that incorporates human editing feedback, enabling a closed-loop generation pipelineβ€”from text or bounding-box prompts to coarse sketches, interactive editing, and finally to high-fidelity binary sketches. Experiments demonstrate significant improvements over state-of-the-art baselines in semantic alignment, structural clarity, and user controllability, effectively supporting artist-in-the-loop generative workflows.

Technology Category

Application Category

πŸ“ Abstract
Sketches serve as fundamental blueprints in artistic creation because sketch editing is easier and more intuitive than pixel-level RGB image editing for painting artists, yet sketch generation remains unexplored despite advancements in generative models. We propose a novel framework CoProSketch, providing prominent controllability and details for sketch generation with diffusion models. A straightforward method is fine-tuning a pretrained image generation diffusion model with binarized sketch images. However, we find that the diffusion models fail to generate clear binary images, which makes the produced sketches chaotic. We thus propose to represent the sketches by unsigned distance field (UDF), which is continuous and can be easily decoded to sketches through a lightweight network. With CoProSketch, users generate a rough sketch from a bounding box and a text prompt. The rough sketch can be manually edited and fed back into the model for iterative refinement and will be decoded to a detailed sketch as the final result. Additionally, we curate the first large-scale text-sketch paired dataset as the training data. Experiments demonstrate superior semantic consistency and controllability over baselines, offering a practical solution for integrating user feedback into generative workflows.
Problem

Research questions and friction points this paper is trying to address.

Generating clear binary sketches using diffusion models
Providing controllability and iterative refinement for sketch generation
Creating a large-scale text-sketch dataset for training
Innovation

Methods, ideas, or system contributions that make the work stand out.

UDF representation for continuous sketch decoding
Iterative refinement with user feedback
Large-scale text-sketch paired dataset
πŸ”Ž Similar Papers
2024-07-12Neural Information Processing SystemsCitations: 0
R
Ruohao Zhan
State Key Laboratory of CAD & CG, Zhejiang University
Yijin Li
Yijin Li
State Key Lab of CAD&CG, Zhejiang University, China
Computer Vision
Yisheng He
Yisheng He
HKUST
Computer VisionDeep LearningEmbodied AI
S
Shuo Chen
State Key Laboratory of CAD & CG, Zhejiang University
Y
Yichen Shen
State Key Laboratory of CAD & CG, Zhejiang University
X
Xinyu Chen
State Key Laboratory of CAD & CG, Zhejiang University
Zilong Dong
Zilong Dong
Institute for Intelligent Computing, Alibaba Group
NeRF3D Human3D Generation3D Understanding
Zhaoyang Huang
Zhaoyang Huang
Chinese University of Hong Kong
computer vision
G
Guofeng Zhang
State Key Laboratory of CAD & CG, Zhejiang University