đ¤ AI Summary
This paper addresses fine-grained semantic attribute manipulation of human motion dataâexemplified by karate movementsâby proposing the first controllable motion editing framework that ensures anatomical plausibility and structural preservation. Methodologically: (i) a rotation-based skeletal representation is introduced to decouple motion trajectories from anatomical constraints; (ii) a two-stage architectureâcomprising a Transformer encoder and a diffusion modelâis designed to learn linear, semantically interpretable latent embeddings; (iii) high-level attributesâincluding force, velocity, and stylistic characteristicsâare edited independently via directional vectors in the latent space. Extensive evaluation on the Karate dataset demonstrates precise, multi-attribute control (e.g., attack intensity, tempo), achieving low reconstruction error and natural, fluid motion synthesis. The code and dataset are publicly released.
đ Abstract
Attribute manipulation deals with the problem of changing individual attributes of a data point or a time series, while leaving all other aspects unaffected. This work focuses on the domain of human motion, more precisely karate movement patterns. To the best of our knowledge, it presents the first success at manipulating attributes of human motion data. One of the key requirements for achieving attribute manipulation on human motion is a suitable pose representation. Therefore, we design a novel rotation-based pose representation that enables the disentanglement of the human skeleton and the motion trajectory, while still allowing an accurate reconstruction of the original anatomy. The core idea of the manipulation approach is to use a transformer encoder for discovering high-level semantics, and a diffusion probabilistic model for modeling the remaining stochastic variations. We show that the embedding space obtained from the transformer encoder is semantically meaningful and linear. This enables the manipulation of high-level attributes, by discovering their linear direction of change in the semantic embedding space and moving the embedding along said direction. The code and data are available at https://github.com/anthony-mendil/MoDiffAE.