🤖 AI Summary
This work addresses the challenge that existing text-to-motion generation methods struggle to accurately express motions of specific body parts and often produce globally incoherent full-body movements when integrating local actions. To overcome this, the authors propose the ParTY framework, which first generates part-level motions through a localized guidance mechanism and then aligns these with text semantics via a part-aware text embedding strategy, enabling fine-grained correspondence between linguistic descriptions and individual body regions. Furthermore, ParTY introduces a global-local adaptive fusion module to effectively balance expressive motion detail with full-body coherence. Experimental results demonstrate that ParTY significantly outperforms current approaches in both local motion accuracy and global motion consistency, confirming its superior capability in generating complex motions from detailed textual descriptions.
📝 Abstract
Text-to-motion synthesis aims to generate natural and expressive human motions from textual descriptions. While existing approaches primarily focus on generating holistic motions from text descriptions, they struggle to accurately reflect actions involving specific body parts. Recent part-wise motion generation methods attempt to resolve this but face two critical limitations: (i) they lack explicit mechanisms for aligning textual semantics with individual body parts, and (ii) they often generate incoherent full-body motions due to integrating independently generated part motions. To overcome these issues and resolve the fundamental trade-off in existing methods, we propose ParTY, a novel framework that enhances part expressiveness while generating coherent full-body motions. ParTY comprises: (1) Part-Guided Network, which first generates part motions to obtain part guidance, then uses it to generate holistic motions; (2) Part-aware Text Grounding, which diversely transforms text embeddings and appropriately aligns them with each body part; and (3) Holistic-Part Fusion, which adaptively fuses holistic motions and part motions. Extensive experiments, including part-level and coherence-level evaluations, demonstrate that ParTY achieves substantial improvements over previous methods.