KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models

📅 2024-10-02
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing sketch-driven diffusion models struggle to balance fault tolerance for novice sketches with high-fidelity control for professional sketches, resulting in an imbalance between fine-grained geometric precision and high-level semantic guidance. To address this, we propose a dual-path regulation framework (CGC+FGC), integrating a coarse-grained semantic controller with a fine-grained refinement controller, and introduce an adjustable “knob” inference mechanism that dynamically adapts generation fidelity to sketch complexity. The modular, plug-and-play design enhances generalizability and deployment flexibility. Experiments on MultiGen-20M and a newly constructed sketch dataset demonstrate significant improvements: novice-sketch generation quality rises markedly (FID ↓18.3%, structural accuracy ↑22.7%), while fidelity for professional sketches is preserved. Concurrently, image naturalness and geometric consistency are both optimized.

Technology Category

Application Category

📝 Abstract
Recent advances in diffusion models have significantly improved text-to-image (T2I) generation, but they often struggle to balance fine-grained precision with high-level control. Methods like ControlNet and T2I-Adapter excel at following sketches by seasoned artists but tend to be overly rigid, replicating unintentional flaws in sketches from novice users. Meanwhile, coarse-grained methods, such as sketch-based abstraction frameworks, offer more accessible input handling but lack the precise control needed for detailed, professional use. To address these limitations, we propose KnobGen, a dual-pathway framework that democratizes sketch-based image generation by seamlessly adapting to varying levels of sketch complexity and user skill. KnobGen uses a Coarse-Grained Controller (CGC) module for high-level semantics and a Fine-Grained Controller (FGC) module for detailed refinement. The relative strength of these two modules can be adjusted through our knob inference mechanism to align with the user's specific needs. These mechanisms ensure that KnobGen can flexibly generate images from both novice sketches and those drawn by seasoned artists. This maintains control over the final output while preserving the natural appearance of the image, as evidenced on the MultiGen-20M dataset and a newly collected sketch dataset.
Problem

Research questions and friction points this paper is trying to address.

Balancing precision and control in sketch-based diffusion models
Adapting to varying sketch complexity and user skill levels
Ensuring flexible image generation from novice to expert sketches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-pathway framework for sketch adaptation
Coarse and fine-grained control modules
Adjustable knob mechanism for user needs
🔎 Similar Papers
No similar papers found.