Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions

📅 2025-04-04

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing text-to-motion generation methods commonly adopt a uniform human body model, neglecting the natural influence of body shape on motion dynamics and thereby producing kinematically implausible motions. To address this, we propose the first shape-aware text-driven motion synthesis framework. Our approach explicitly incorporates continuous body shape parameters (e.g., SMPL β) into the generative pipeline: motion sequences are discretized via FSQ-VAE; a joint language model jointly predicts shape and motion tokens conditioned on text; and motion is decoded under explicit shape conditioning, enabling learnable shape–motion associations. Quantitative evaluation, qualitative analysis, and user studies on AMASS and HumanML3D demonstrate that our method significantly improves motion plausibility and shape–motion consistency. It establishes new state-of-the-art performance in shape-aware text-to-motion generation.

Technology Category

Application Category

📝 Abstract

We explore how body shapes influence human motion synthesis, an aspect often overlooked in existing text-to-motion generation methods due to the ease of learning a homogenized, canonical body shape. However, this homogenization can distort the natural correlations between different body shapes and their motion dynamics. Our method addresses this gap by generating body-shape-aware human motions from natural language prompts. We utilize a finite scalar quantization-based variational autoencoder (FSQ-VAE) to quantize motion into discrete tokens and then leverage continuous body shape information to de-quantize these tokens back into continuous, detailed motion. Additionally, we harness the capabilities of a pretrained language model to predict both continuous shape parameters and motion tokens, facilitating the synthesis of text-aligned motions and decoding them into shape-aware motions. We evaluate our method quantitatively and qualitatively, and also conduct a comprehensive perceptual study to demonstrate its efficacy in generating shape-aware motions.

Problem

Research questions and friction points this paper is trying to address.

Exploring body shape impact on motion synthesis

Generating shape-aware motions from text prompts

Overcoming homogenization in text-to-motion methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

FSQ-VAE quantizes motion into discrete tokens

Continuous shape info de-quantizes tokens into motion

Pretrained language model predicts shape and motion

🔎 Similar Papers

No similar papers found.