🤖 AI Summary
Existing methods struggle to generalize motion representations across species and characters with significantly different skeletal topologies, hindering the development of scalable generative models. This work proposes a semantic-aware, topology-agnostic motion representation framework that decouples motion from skeletal structure by aligning functionally corresponding joints through a semantic modulation mechanism, thereby constructing a unified latent motion manifold. The approach enables, for the first time, zero-shot cross-species motion retargeting without requiring paired data and supports learning a continuous, generation-friendly motion space directly from large-scale, unaligned raw BVH sequences. Experiments demonstrate high-fidelity motion reconstruction on both human and animal datasets, with successful applications in text-to-motion generation and cross-species motion transfer.
📝 Abstract
Generalizing motion representation across diverse characters remains challenging due to significant topological variations in skeletal structures across datasets and species, which hinder the development of scalable generative models. To bridge this gap, we propose a Semantic-Aware Topology-Agnostic framework that learns a unified latent manifold shared by disparate species. Unlike methods relying on fixed hierarchies or rigid padding strategies, our approach leverages a semantic modulation mechanism to align functional joint correspondences, thereby decoupling motion from topology. This design enables the construction of a continuous, generative-friendly motion space from large-scale, unaligned raw BVH data. Experiments on human and animal datasets demonstrate that our framework achieves high-fidelity reconstruction and supports downstream text-to-motion tasks. Notably, the model enables zero-shot cross-species retargeting without paired data. Code and demos are available at: https://github.com/zzysteve/SATA