Motion Diffusion Autoencoders: Enabling Attribute Manipulation in Human Motion Demonstrated on Karate Techniques

📅 2025-01-30

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This paper addresses fine-grained semantic attribute manipulation of human motion data—exemplified by karate movements—by proposing the first controllable motion editing framework that ensures anatomical plausibility and structural preservation. Methodologically: (i) a rotation-based skeletal representation is introduced to decouple motion trajectories from anatomical constraints; (ii) a two-stage architecture—comprising a Transformer encoder and a diffusion model—is designed to learn linear, semantically interpretable latent embeddings; (iii) high-level attributes—including force, velocity, and stylistic characteristics—are edited independently via directional vectors in the latent space. Extensive evaluation on the Karate dataset demonstrates precise, multi-attribute control (e.g., attack intensity, tempo), achieving low reconstruction error and natural, fluid motion synthesis. The code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract

Attribute manipulation deals with the problem of changing individual attributes of a data point or a time series, while leaving all other aspects unaffected. This work focuses on the domain of human motion, more precisely karate movement patterns. To the best of our knowledge, it presents the first success at manipulating attributes of human motion data. One of the key requirements for achieving attribute manipulation on human motion is a suitable pose representation. Therefore, we design a novel rotation-based pose representation that enables the disentanglement of the human skeleton and the motion trajectory, while still allowing an accurate reconstruction of the original anatomy. The core idea of the manipulation approach is to use a transformer encoder for discovering high-level semantics, and a diffusion probabilistic model for modeling the remaining stochastic variations. We show that the embedding space obtained from the transformer encoder is semantically meaningful and linear. This enables the manipulation of high-level attributes, by discovering their linear direction of change in the semantic embedding space and moving the embedding along said direction. The code and data are available at https://github.com/anthony-mendil/MoDiffAE.

Problem

Research questions and friction points this paper is trying to address.

Motion Manipulation

Action Recognition

Feature Adjustment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer Encoder

Diffusion Probability Model

Linear Adjustment of Motion Features

🔎 Similar Papers

No similar papers found.