🤖 AI Summary
To address insufficient professional controllability, ambiguous text prompts, and training dependency in sketch coloring, this paper proposes a fine-tuning-free, guided multi-stage coloring framework. Methodologically, it integrates ControlNet and region-specific masks into Stable Diffusion v1.5, introducing three novel components: (i) a sketch inversion mechanism for faithful latent initialization; (ii) a guidance-aware sampling strategy for structured color propagation; and (iii) a scaled self-attention module to harmonize local detail fidelity with global color consistency. Semantic understanding from BLIP-2 is fused with user-specified palettes to enable fine-grained, interactive control. The framework achieves second-level inference on a single RTX 4090 Super, delivering high-fidelity outputs with strong intent alignment—suitable for professional applications such as design prototyping and storyboarding. To our knowledge, this is the first work achieving zero-shot, highly controllable, and real-time sketch coloring, significantly advancing both generation quality and interactive efficiency.
📝 Abstract
This paper introduces a novel approach to sketch colourisation, inspired by the universal childhood activity of colouring and its professional applications in design and story-boarding. Striking a balance between precision and convenience, our method utilises region masks and colour palettes to allow intuitive user control, steering clear of the meticulousness of manual colour assignments or the limitations of textual prompts. By strategically combining ControlNet and staged generation, incorporating Stable Diffusion v1.5, and leveraging BLIP-2 text prompts, our methodology facilitates faithful image generation and user-directed colourisation. Addressing challenges of local and global consistency, we employ inventive solutions such as an inversion scheme, guided sampling, and a self-attention mechanism with a scaling factor. The resulting tool is not only fast and training-free but also compatible with consumer-grade Nvidia RTX 4090 Super GPUs, making it a valuable asset for both creative professionals and enthusiasts in various fields. Project Page: url{https://chaitron.github.io/SketchDeco/}