🤖 AI Summary
Existing text-to-motion generation models struggle to model fine-grained stylistic nuances (e.g., “chicken dance”) due to severe scarcity of style-specific training data, leading to stylistic distortion and distributional shift in generated motions. To address this, we propose LoRA-MDM—the first lightweight framework integrating Low-Rank Adaptation (LoRA) into Motion Diffusion Models (MDM). Instead of frame-wise editing, LoRA-MDM injects stylistic priors into the diffusion prior layers, enabling explicit disentanglement of style and semantics. Our method comprises two core components: style-prior alignment and semantic motion manifold offset learning. With only a few reference motion samples, LoRA-MDM enables cross-style transfer, multi-style fusion, and interactive controllable editing. It significantly improves style consistency while preserving text fidelity and generalizes to unseen actions with high-quality stylized generation.
📝 Abstract
Text-to-motion generative models span a wide range of 3D human actions but struggle with nuanced stylistic attributes such as a"Chicken"style. Due to the scarcity of style-specific data, existing approaches pull the generative prior towards a reference style, which often results in out-of-distribution low quality generations. In this work, we introduce LoRA-MDM, a lightweight framework for motion stylization that generalizes to complex actions while maintaining editability. Our key insight is that adapting the generative prior to include the style, while preserving its overall distribution, is more effective than modifying each individual motion during generation. Building on this idea, LoRA-MDM learns to adapt the prior to include the reference style using only a few samples. The style can then be used in the context of different textual prompts for generation. The low-rank adaptation shifts the motion manifold in a semantically meaningful way, enabling realistic style infusion even for actions not present in the reference samples. Moreover, preserving the distribution structure enables advanced operations such as style blending and motion editing. We compare LoRA-MDM to state-of-the-art stylized motion generation methods and demonstrate a favorable balance between text fidelity and style consistency.