Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions

📅 2025-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-to-motion generation methods commonly adopt a uniform human body model, neglecting the natural influence of body shape on motion dynamics and thereby producing kinematically implausible motions. To address this, we propose the first shape-aware text-driven motion synthesis framework. Our approach explicitly incorporates continuous body shape parameters (e.g., SMPL β) into the generative pipeline: motion sequences are discretized via FSQ-VAE; a joint language model jointly predicts shape and motion tokens conditioned on text; and motion is decoded under explicit shape conditioning, enabling learnable shape–motion associations. Quantitative evaluation, qualitative analysis, and user studies on AMASS and HumanML3D demonstrate that our method significantly improves motion plausibility and shape–motion consistency. It establishes new state-of-the-art performance in shape-aware text-to-motion generation.

Technology Category

Application Category

📝 Abstract
We explore how body shapes influence human motion synthesis, an aspect often overlooked in existing text-to-motion generation methods due to the ease of learning a homogenized, canonical body shape. However, this homogenization can distort the natural correlations between different body shapes and their motion dynamics. Our method addresses this gap by generating body-shape-aware human motions from natural language prompts. We utilize a finite scalar quantization-based variational autoencoder (FSQ-VAE) to quantize motion into discrete tokens and then leverage continuous body shape information to de-quantize these tokens back into continuous, detailed motion. Additionally, we harness the capabilities of a pretrained language model to predict both continuous shape parameters and motion tokens, facilitating the synthesis of text-aligned motions and decoding them into shape-aware motions. We evaluate our method quantitatively and qualitatively, and also conduct a comprehensive perceptual study to demonstrate its efficacy in generating shape-aware motions.
Problem

Research questions and friction points this paper is trying to address.

Exploring body shape impact on motion synthesis
Generating shape-aware motions from text prompts
Overcoming homogenization in text-to-motion methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

FSQ-VAE quantizes motion into discrete tokens
Continuous shape info de-quantizes tokens into motion
Pretrained language model predicts shape and motion
🔎 Similar Papers
No similar papers found.