🤖 AI Summary
Existing animation retargeting methods rely on templates, rigged skeletons, or annotated data, suffering from poor generalization and motion jitter. This paper proposes the first fully self-supervised, universal animation transfer framework: it requires no templates, skeletons, or human annotations, and takes only sparse motion signals (2D/3D keypoint sequences) as input to robustly transfer motions onto arbitrary mesh characters—regardless of topology or geometry. Our core innovation is Kinetic Codes: semantically rich motion latent representations learned via an autoencoder, coupled with a spatiotemporal gradient prediction network for end-to-end motion reconstruction. Evaluated on multi-source datasets including AMASS, our method achieves state-of-the-art generalization performance, significantly improving adaptability to unseen motions and diverse characters—including non-human topologies—while effectively suppressing motion jitter.
📝 Abstract
Animation retargeting involves applying a sparse motion description (e.g., 2D/3D keypoint sequences) to a given character mesh to produce a semantically plausible and temporally coherent full-body motion. Existing approaches come with a mix of restrictions - they require annotated training data, assume access to template-based shape priors or artist-designed deformation rigs, suffer from limited generalization to unseen motion and/or shapes, or exhibit motion jitter. We propose Self-supervised Motion Fields (SMF) as a self-supervised framework that can be robustly trained with sparse motion representations, without requiring dataset specific annotations, templates, or rigs. At the heart of our method are Kinetic Codes, a novel autoencoder-based sparse motion encoding, that exposes a semantically rich latent space simplifying large-scale training. Our architecture comprises dedicated spatial and temporal gradient predictors, which are trained end-to-end. The resultant network, regularized by the Kinetic Codes's latent space, has good generalization across shapes and motions. We evaluated our method on unseen motion sampled from AMASS, D4D, Mixamo, and raw monocular video for animation transfer on various characters with varying shapes and topology. We report a new SoTA on the AMASS dataset in the context of generalization to unseen motion. Project webpage at https://motionfields.github.io/