Unconditional Human Motion and Shape Generation via Balanced Score-Based Diffusion

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses unconditional joint generation of human motion and 3D body shape. Methodologically, it proposes a score-based diffusion framework that eliminates reliance on kinematic priors or post-hoc mesh reconstruction. It avoids over-parameterized inputs and auxiliary losses, employing only standard L2 score matching. A kinematics-aware feature-space normalization is introduced, and loss weights are analytically derived to ensure training dynamic balance. All modules are grounded in theoretical analysis to guarantee stability and generalization. Experiments demonstrate state-of-the-art performance on unconditional motion generation. Moreover, the method achieves— for the first time—high-fidelity, temporally coherent 3D body shape generation directly from diffusion sampling, significantly outperforming conventional two-stage paradigms that first predict keypoints and then reconstruct meshes.

Technology Category

Application Category

📝 Abstract
Recent work has explored a range of model families for human motion generation, including Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and diffusion-based models. Despite their differences, many methods rely on over-parameterized input features and auxiliary losses to improve empirical results. These strategies should not be strictly necessary for diffusion models to match the human motion distribution. We show that on par with state-of-the-art results in unconditional human motion generation are achievable with a score-based diffusion model using only careful feature-space normalization and analytically derived weightings for the standard L2 score-matching loss, while generating both motion and shape directly, thereby avoiding slow post hoc shape recovery from joints. We build the method step by step, with a clear theoretical motivation for each component, and provide targeted ablations demonstrating the effectiveness of each proposed addition in isolation.
Problem

Research questions and friction points this paper is trying to address.

Generating realistic human motion and shape without conditioning inputs
Eliminating over-parameterized features and auxiliary losses in diffusion models
Avoiding slow post-processing shape recovery from joint positions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Score-based diffusion model for motion generation
Feature-space normalization and analytical loss weightings
Direct generation of both motion and shape
🔎 Similar Papers
No similar papers found.
D
David Björkstrand
Robotics Perception and Learning, Royal Institute of Technology, Sweden, Stockholm
T
Tiesheng Wang
EA Sports TRACAB, Sweden, Stockholm
L
Lars Bretzner
EA Sports TRACAB, Sweden, Stockholm
Josephine Sullivan
Josephine Sullivan
Lecturer, Computer Science, Royal Institute of Technology, Stockholm
Computer visionmachine learningstatistics