KinMo: Kinematic-aware Human Motion Understanding and Generation

📅 2024-11-23

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Existing text-driven human motion generation methods rely on global action descriptors (e.g., “running”), failing to capture velocity variations, joint poses, and kinematic–dynamic constraints—leading to semantic ambiguity between text and motion modalities and lacking fine-grained controllability. To address this, we propose a kinematics-aware joint-group decomposition representation and a hierarchical semantic alignment framework. Specifically, we introduce biomechanically constrained joint grouping for the first time; construct the first automatically generated fine-grained text–motion paired dataset; and design a coarse-to-fine hierarchical semantic fusion and generation architecture that enables joint-level interactive encoding and cross-modal alignment. Experiments demonstrate significant improvements in text-to-motion retrieval accuracy—particularly in joint-spatial understanding—and enable high-fidelity, editable local joint motion generation and manipulation.

Technology Category

Application Category

📝 Abstract

Controlling human motion based on text presents an important challenge in computer vision. Traditional approaches often rely on holistic action descriptions for motion synthesis, which struggle to capture subtle movements of local body parts. This limitation restricts the ability to isolate and manipulate specific movements. To address this, we propose a novel motion representation that decomposes motion into distinct body joint group movements and interactions from a kinematic perspective. We design an automatic dataset collection pipeline that enhances the existing text-motion benchmark by incorporating fine-grained local joint-group motion and interaction descriptions. To bridge the gap between text and motion domains, we introduce a hierarchical motion semantics approach that progressively fuses joint-level interaction information into the global action-level semantics for modality alignment. With this hierarchy, we introduce a coarse-to-fine motion synthesis procedure for various generation and editing downstream applications. Our quantitative and qualitative experiments demonstrate that the proposed formulation enhances text-motion retrieval by improving joint-spatial understanding, and enables more precise joint-motion generation and control. Project Page: {smallurl{https://andypinxinliu.github.io/KinMo/}}

Problem

Research questions and friction points this paper is trying to address.

Addresses modality gap in human motion synthesis

Improves motion understanding with fine-grained descriptions

Enhances motion generation via hierarchical text-motion alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical describable motion representation

Automated annotation pipeline for fine-grained descriptions

Coarse-to-fine generation procedure for motion synthesis

🔎 Similar Papers

No similar papers found.