LaMoGen: Language to Motion Generation Through LLM-Guided Symbolic Inference

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-to-motion generation methods rely on end-to-end embeddings, struggling to simultaneously achieve temporal precision, fine-grained detail, and interpretability. This work proposes LaMoGen, a novel framework that introduces LabanLite—a lightweight symbolic system derived from Labanotation—and establishes a three-stage “text–symbol–motion” generation pipeline. For the first time, large language models (LLMs) are leveraged at the symbolic level to perform motion reasoning and recombination. This approach enables an interpretable mapping between language and motion, significantly outperforming state-of-the-art methods on both a newly curated Labanotation benchmark and two public datasets. The proposed method achieves substantial improvements in motion interpretability, controllability, and alignment between textual descriptions and generated motions.

Technology Category

Application Category

📝 Abstract
Human motion is highly expressive and naturally aligned with language, yet prevailing methods relying heavily on joint text-motion embeddings struggle to synthesize temporally accurate, detailed motions and often lack explainability. To address these limitations, we introduce LabanLite, a motion representation developed by adapting and extending the Labanotation system. Unlike black-box text-motion embeddings, LabanLite encodes each atomic body-part action (e.g., a single left-foot step) as a discrete Laban symbol paired with a textual template. This abstraction decomposes complex motions into interpretable symbol sequences and body-part instructions, establishing a symbolic link between high-level language and low-level motion trajectories. Building on LabanLite, we present LaMoGen, a Text-to-LabanLite-to-Motion Generation framework that enables large language models (LLMs) to compose motion sequences through symbolic reasoning. The LLM interprets motion patterns, relates them to textual descriptions, and recombines symbols into executable plans, producing motions that are both interpretable and linguistically grounded. To support rigorous evaluation, we introduce a Labanotation-based benchmark with structured description-motion pairs and three metrics that jointly measure text-motion alignment across symbolic, temporal, and harmony dimensions. Experiments demonstrate that LaMoGen establishes a new baseline for both interpretability and controllability, outperforming prior methods on our benchmark and two public datasets. These results highlight the advantages of symbolic reasoning and agent-based design for language-driven motion synthesis.
Problem

Research questions and friction points this paper is trying to address.

text-to-motion generation
motion interpretability
temporal accuracy
language-motion alignment
symbolic representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

symbolic reasoning
Labanotation
interpretable motion generation
LLM-guided generation
text-to-motion synthesis
Junkun Jiang
Junkun Jiang
Hong Kong Baptist University
Computer VisionHuman Pose EstimationMotion Capture
H
Ho Yin Au
Department of Computer Science, Hong Kong Baptist University, HKSAR
J
Jingyu Xiang
Department of Computer Science, Hong Kong Baptist University, HKSAR
Jie Chen
Jie Chen
Hong Kong Baptist Univesrity
Computational PhotographyMultimedia3DArt-Tech