MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer

📅 2025-04-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of overfitting, poor generalization, and the trade-off between motion diversity and naturalness in generative modeling under single-reference motion capture (MoCap) data, this paper proposes a localized generative masking paradigm. Our method comprises three key components: (1) a distribution-regularized motion vector quantization codebook to enhance codebook generalizability; (2) a sliding-window local attention mechanism to effectively model long-range temporal motion dependencies; and (3) topology-agnostic motion embedding to disentangle structural and kinematic representations. Under stringent small-data constraints, our approach significantly outperforms state-of-the-art GAN- and diffusion-based methods, achieving new SOTA results in both fidelity (e.g., FID, MMD) and diversity (e.g., LPIPS, diversity score). Moreover, it enables practical applications including single-reference-driven temporal editing, large-scale crowd animation synthesis, and beat-aligned dance generation.

Technology Category

Application Category

📝 Abstract
Generative masked transformers have demonstrated remarkable success across various content generation tasks, primarily due to their ability to effectively model large-scale dataset distributions with high consistency. However, in the animation domain, large datasets are not always available. Applying generative masked modeling to generate diverse instances from a single MoCap reference may lead to overfitting, a challenge that remains unexplored. In this work, we present MotionDreamer, a localized masked modeling paradigm designed to learn internal motion patterns from a given motion with arbitrary topology and duration. By embedding the given motion into quantized tokens with a novel distribution regularization method, MotionDreamer constructs a robust and informative codebook for local motion patterns. Moreover, a sliding window local attention is introduced in our masked transformer, enabling the generation of natural yet diverse animations that closely resemble the reference motion patterns. As demonstrated through comprehensive experiments, MotionDreamer outperforms the state-of-the-art methods that are typically GAN or Diffusion-based in both faithfulness and diversity. Thanks to the consistency and robustness of the quantization-based approach, MotionDreamer can also effectively perform downstream tasks such as temporal motion editing, extcolor{update}{crowd animation}, and beat-aligned dance generation, all using a single reference motion. Visit our project page: https://motiondreamer.github.io/
Problem

Research questions and friction points this paper is trying to address.

Overcoming overfitting in generative masked transformers for motion synthesis
Learning internal motion patterns from single MoCap references
Generating diverse animations resembling reference motion patterns
Innovation

Methods, ideas, or system contributions that make the work stand out.

Localized generative masked transformer for motion synthesis
Quantized tokens with distribution regularization method
Sliding window local attention for diverse animations
🔎 Similar Papers
No similar papers found.
Y
Yilin Wang
University of Alberta
C
Chuan Guo
University of Alberta
Yuxuan Mu
Yuxuan Mu
Simon Fraser University
3D Computer VisionComputer Animation
M
Muhammad Gohar Javed
University of Alberta
X
X. Zuo
Concordia University
Juwei Lu
Juwei Lu
Senior Principal Researcher at Huawei Noah's Ark Laboratory - Canada
deep learningcomputer visionbig datavideo analyticsmachine learning
H
Hai Jiang
University of Alberta
L
Li Cheng
University of Alberta