๐ค AI Summary
Existing methods only support static image generation from hand-drawn sketches, lacking controllable video animation synthesis. This paper introduces the first sketch-driven high-fidelity video generation framework, enabling non-expert users to directly generate dynamic content from arbitrary hand-drawn sketches and brief text prompts. Methodologically, we propose a Level-Based Sketch Control strategy and a Temporal-Spatial Attention mechanism to adaptively modulate guidance strength across varying user sketching proficiencies and significantly improve inter-frame temporal coherence. Built upon diffusion models, our framework supports zero-shot transfer and multi-sketch joint control. Quantitative and qualitative evaluations demonstrate that our approach substantially outperforms prior arts in both visual fidelity and temporal consistency, achievingโ for the first timeโthe end-to-end, controllable generation of videos from sketches.
๐ Abstract
With the advancement of generative artificial intelligence, previous studies have achieved the task of generating aesthetic images from hand-drawn sketches, fulfilling the public's needs for drawing. However, these methods are limited to static images and lack the ability to control video animation generation using hand-drawn sketches. To address this gap, we propose VidSketch, the first method capable of generating high-quality video animations directly from any number of hand-drawn sketches and simple text prompts, bridging the divide between ordinary users and professional artists. Specifically, our method introduces a Level-Based Sketch Control Strategy to automatically adjust the guidance strength of sketches during the generation process, accommodating users with varying drawing skills. Furthermore, a TempSpatial Attention mechanism is designed to enhance the spatiotemporal consistency of generated video animations, significantly improving the coherence across frames. You can find more detailed cases on our official website.