LaMoGen: Laban Movement-Guided Diffusion for Text-to-Motion Generation

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Current text-to-motion generation models struggle to achieve fine-grained, interpretable control over motion expressivity, primarily due to insufficient stylistic diversity in motion and the inherent difficulty of precisely encoding quantitative kinematic attributes using natural language. This paper introduces the first text-guided diffusion framework integrating Laban Movement Analysis (LMA), explicitly modeling LMA’s “Effort” and “Shape” dimensions as controllable conditioning variables. We further propose a zero-shot text-embedding optimization mechanism that dynamically refines pretrained text embeddings during sampling to align with target LMA labels—requiring no additional motion annotations. Our method preserves motion identity while significantly enhancing both the diversity and controllability of expressive motion. Experiments demonstrate precise, compositional responsiveness to multiple LMA feature combinations, enabling interpretable, fine-grained motion generation.

Technology Category

Application Category

📝 Abstract

Diverse human motion generation is an increasingly important task, having various applications in computer vision, human-computer interaction and animation. While text-to-motion synthesis using diffusion models has shown success in generating high-quality motions, achieving fine-grained expressive motion control remains a significant challenge. This is due to the lack of motion style diversity in datasets and the difficulty of expressing quantitative characteristics in natural language. Laban movement analysis has been widely used by dance experts to express the details of motion including motion quality as consistent as possible. Inspired by that, this work aims for interpretable and expressive control of human motion generation by seamlessly integrating the quantification methods of Laban Effort and Shape components into the text-guided motion generation models. Our proposed zero-shot, inference-time optimization method guides the motion generation model to have desired Laban Effort and Shape components without any additional motion data by updating the text embedding of pretrained diffusion models during the sampling step. We demonstrate that our approach yields diverse expressive motion qualities while preserving motion identity by successfully manipulating motion attributes according to target Laban tags.

Problem

Research questions and friction points this paper is trying to address.

Achieving fine-grained expressive motion control in text-to-motion generation

Addressing lack of motion style diversity in existing datasets

Integrating Laban movement quantification into diffusion-based motion synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Laban Effort and Shape quantification methods

Uses zero-shot inference-time optimization for guidance

Updates text embedding during sampling in diffusion models

🔎 Similar Papers

LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning