Rethinking Diffusion for Text-Driven Human Motion Generation

📅 2024-11-25

🏛️ arXiv.org

📈 Citations: 6

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing VQ-based methods for text-driven human motion generation suffer from information loss, limited diversity, and weak prior knowledge, while current diffusion models still underperform VQ approaches in fidelity and controllability. Method: This paper proposes a novel continuous-space diffusion framework grounded in motion representation learning and distribution modeling. It introduces bidirectional masked autoregression into diffusion-based motion generation for the first time; designs a data reparameterization scheme that preserves motion continuity and enforces physical constraints; and establishes a continuous probabilistic paradigm for text–motion alignment. A cross-paradigm robust evaluation protocol is also developed. Contribution/Results: The framework achieves state-of-the-art performance across multiple benchmarks, significantly improving motion diversity, physical plausibility, and text–motion alignment accuracy—outperforming both leading VQ-based and diffusion-based methods comprehensively.

Technology Category

Application Category

📝 Abstract

Since 2023, Vector Quantization (VQ)-based discrete generation methods have rapidly dominated human motion generation, primarily surpassing diffusion-based continuous generation methods in standard performance metrics. However, VQ-based methods have inherent limitations. Representing continuous motion data as limited discrete tokens leads to inevitable information loss, reduces the diversity of generated motions, and restricts their ability to function effectively as motion priors or generation guidance. In contrast, the continuous space generation nature of diffusion-based methods makes them well-suited to address these limitations and with even potential for model scalability. In this work, we systematically investigate why current VQ-based methods perform well and explore the limitations of existing diffusion-based methods from the perspective of motion data representation and distribution. Drawing on these insights, we preserve the inherent strengths of a diffusion-based human motion generation model and gradually optimize it with inspiration from VQ-based approaches. Our approach introduces a human motion diffusion model enabled to perform bidirectional masked autoregression, optimized with a reformed data representation and distribution. Additionally, we also propose more robust evaluation methods to fairly assess different-based methods. Extensive experiments on benchmark human motion generation datasets demonstrate that our method excels previous methods and achieves state-of-the-art performances.

Problem

Research questions and friction points this paper is trying to address.

Overcome information loss in VQ-based human motion generation

Enhance diversity and scalability of diffusion-based motion generation

Improve evaluation robustness for human motion generation methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Masked autoregression in diffusion model

Reformed data representation and distribution

Robust evaluation method for assessment

🔎 Similar Papers

No similar papers found.