DrawMotion: Generating 3D Human Motions by Freehand Drawing

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Existing text-to-3D human motion generation methods often struggle to accurately capture user intent, limiting their effectiveness. To address this, this work proposes DrawMotion—a diffusion-based, multi-condition motion generation framework that introduces hand-drawn stick-figure sketches as a novel spatial guidance modality, jointly controlling the generation process alongside textual semantics. The approach features a Multi-Condition Fusion Module (MCM) that flexibly supports arbitrary combinations of input conditions and integrates a training-free classifier guidance mechanism to enhance alignment between generated motions and user intent. Experimental results demonstrate that incorporating hand-drawn sketches reduces the time required for users to generate target motions by 46.7% compared to text-only input, significantly improving both interaction efficiency and motion quality.

📝 Abstract

Text-to-motion generation, which translates textual descriptions into human motions, faces the challenge that users often struggle to precisely convey their intended motions through text alone. To address this issue, this paper introduces DrawMotion, an efficient diffusion-based framework designed for multi-condition scenarios. DrawMotion generates motions based on both a conventional text condition and a novel hand-drawing condition, which provide semantic and spatial control over the generated motions, respectively. Specifically, we tackle the fine-grained motion generation task from three perspectives: 1) freehand drawing condition. To accurately capture users' intended motions without requiring tedious textual input, we develop an algorithm to automatically generate hand-drawn stickman sketches across different dataset formats; 2) multi-condition fusion. We propose a Multi-Condition Module (MCM) that is integrated into the diffusion process, enabling the model to exploit all possible condition combinations while reducing computational complexity compared to conventional approaches; and 3) training-free guidance. Notably, the MCM in DrawMotion ensures that its intermediate features lie in a continuous space, allowing classifier-guidance gradients to update the features and thereby aligning the generated motions with user intentions while preserving fidelity. Quantitative experiments and user studies demonstrate that the freehand drawing approach reduces user time by approximately 46.7% when generating motions aligned with their imagination. The code, demos, and relevant data are publicly available at https://github.com/InvertedForest/DrawMotion.

Problem

Research questions and friction points this paper is trying to address.

text-to-motion generation

human motion

user intention

motion control

freehand drawing

Innovation

Methods, ideas, or system contributions that make the work stand out.

freehand drawing

multi-condition fusion

diffusion model