StickMotion: Generating 3D Human Motions by Drawing a Stickman

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of ambiguous intent expression and coarse-grained control in text-driven 3D human motion generation. To this end, we propose a novel paradigm that jointly leverages textual descriptions and hand-drawn stickman sketches as complementary conditioning inputs. Methodologically, we (i) introduce hand-drawn stickmen as explicit, interpretable pose priors; (ii) design a lightweight multi-condition cross-fusion module—replacing costly self-attention—to enable efficient multimodal integration; (iii) incorporate a dynamic supervision strategy that adaptively refines joint positions for enhanced motion naturalness; and (iv) develop an automated stickman synthesis algorithm to support scalable training. Experiments demonstrate that users reduce editing time by 51.5% using stickmen, with significantly improved intent consistency. Our approach achieves state-of-the-art quantitative performance across multiple benchmarks, validating the effectiveness of fine-grained pose control and efficient multimodal fusion.

Technology Category

Application Category

📝 Abstract
Text-to-motion generation, which translates textual descriptions into human motions, has been challenging in accurately capturing detailed user-imagined motions from simple text inputs. This paper introduces StickMotion, an efficient diffusion-based network designed for multi-condition scenarios, which generates desired motions based on traditional text and our proposed stickman conditions for global and local control of these motions, respectively. We address the challenges introduced by the user-friendly stickman from three perspectives: 1) Data generation. We develop an algorithm to generate hand-drawn stickmen automatically across different dataset formats. 2) Multi-condition fusion. We propose a multi-condition module that integrates into the diffusion process and obtains outputs of all possible condition combinations, reducing computational complexity and enhancing StickMotion's performance compared to conventional approaches with the self-attention module. 3) Dynamic supervision. We empower StickMotion to make minor adjustments to the stickman's position within the output sequences, generating more natural movements through our proposed dynamic supervision strategy. Through quantitative experiments and user studies, sketching stickmen saves users about 51.5% of their time generating motions consistent with their imagination. Our codes, demos, and relevant data will be released to facilitate further research and validation within the scientific community.
Problem

Research questions and friction points this paper is trying to address.

Generates 3D human motions from text and stickman drawings.
Improves motion accuracy with multi-condition fusion techniques.
Reduces user time by 51.5% for motion generation tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based network for multi-condition motion generation
Algorithm for automatic stickman data generation
Dynamic supervision for natural movement adjustments
🔎 Similar Papers
No similar papers found.
T
Tao Wang
Beijing University of Posts and Telecommunications
Z
Zhihua Wu
University of Science and Technology of China
Qiaozhi He
Qiaozhi He
ByteDance
LLMNatural Language Processing
J
Jiaming Chu
Beijing University of Posts and Telecommunications
L
Ling Qian
China Mobile (Suzhou) Software Technology Co, Ltd.
Y
Yu Cheng
National University of Singapore
J
Junliang Xing
Tsinghua University
J
Jian Zhao
China Telecom Institute of AI, Northwestern Polytechnical University
L
Lei Jin
Beijing University of Posts and Telecommunications