SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models

📅 2024-03-14

📈 Citations: 1

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing region-controllable diffusion models suffer from slow generation speed (52 seconds for 512×512 images) and incompatibility with acceleration techniques such as Latent Consistency Models (LCM), hindering interactive creative applications. To address this, SemanticDraw unifies region-based semantic control with diffusion acceleration—specifically LCM—within a single framework, introducing a streaming batched inference pipeline. Its custom multi-prompt streaming engine integrates region masks, prompt embedding alignment, and latent-space consistency modeling to preserve both multi-prompt expressivity and region-level semantic fidelity. Evaluated on an RTX 2080 Ti, SemanticDraw generates 512×512 images in just 0.64 seconds—achieving a 10× speedup over prior region-controllable methods and enabling sub-second latency. This breakthrough significantly advances the feasibility of real-time, interactive content generation with precise spatial and semantic control.

Technology Category

Application Category

📝 Abstract

We introduce SemanticDraw, a new paradigm of interactive content creation where high-quality images are generated in near real-time from given multiple hand-drawn regions, each encoding prescribed semantic meaning. In order to maximize the productivity of content creators and to fully realize their artistic imagination, it requires both quick interactive interfaces and fine-grained regional controls in their tools. Despite astonishing generation quality from recent diffusion models, we find that existing approaches for regional controllability are very slow (52 seconds for $512 imes 512$ image) while not compatible with acceleration methods such as LCM, blocking their huge potential in interactive content creation. From this observation, we build our solution for interactive content creation in two steps: (1) we establish compatibility between region-based controls and acceleration techniques for diffusion models, maintaining high fidelity of multi-prompt image generation with $ imes 10$ reduced number of inference steps, (2) we increase the generation throughput with our new multi-prompt stream batch pipeline, enabling low-latency generation from multiple, region-based text prompts on a single RTX 2080 Ti GPU. Our proposed framework is generalizable to any existing diffusion models and acceleration schedulers, allowing sub-second (0.64 seconds) image content creation application upon well-established image diffusion models. Our project page is: https://jaerinlee.com/research/semantic-draw

Problem

Research questions and friction points this paper is trying to address.

Enable real-time interactive image generation from hand-drawn regions

Achieve fine-grained regional control with diffusion model acceleration

Reduce latency for multi-prompt generation on standard GPUs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compatible regional controls with diffusion acceleration

Multi-prompt stream batch for low-latency generation

Sub-second image creation on standard GPUs

🔎 Similar Papers

Streamlining Image Editing with Layered Diffusion Brushes