🤖 AI Summary
Existing animation generation methods constrain sketches to fixed instructions or predefined forms, overlooking their expressive freedom and the user’s central role in shaping dynamic intent. This work proposes a novel interactive paradigm that, for the first time, integrates freehand sketching with vision-language models, enabling users to intuitively convey motion intentions through minimal sketches and seamlessly incorporate them into animation workflows—from storyboarding to 2D and 3D generation. Through an interactive interface and a three-stage user study (N=24), we demonstrate that our approach effectively guides video generation with minimal input, resolves sketch ambiguity, and successfully extends to 3D scenarios, significantly enhancing user control and expressive freedom over generated outcomes.
📝 Abstract
Sketching provides an intuitive way to convey dynamic intent in animation authoring (i.e., how elements change over time and space), making it a natural medium for automatic content creation. Yet existing approaches often constrain sketches to fixed command tokens or predefined visual forms, overlooking their freeform nature and the central role of humans in shaping intention. To address this, we introduce an interaction paradigm where users convey dynamic intent to a vision-language model via free-form sketching, instantiated here in a sketch storyboard to motion graphics workflow. We implement an interface and improve it through a three-stage study with 24 participants. The study shows how sketches convey motion with minimal input, how their inherent ambiguity requires users to be involved for clarification, and how sketches can visually guide video refinement. Our findings reveal the potential of sketch and AI interaction to bridge the gap between intention and outcome, and demonstrate its applicability to 3D animation and video generation.