Self Supervised Networks for Learning Latent Space Representations of Human Body Scans and Motions

📅 2024-11-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses two key challenges in 3D human scanning and motion modeling: (1) fast, robust embedding estimation for unregistered meshes, and (2) geometrically faithful modeling of pose parameter spaces. We propose two self-supervised frameworks—VariShaPE and MoGeN. VariShaPE introduces a novel *variational manifold* architecture for shape parameter estimation, enabling millisecond-level encoding of unregistered meshes without correspondence. MoGeN embeds SMPL’s latent pose space into a higher-dimensional Euclidean space, yielding the first linearly interpolatable motion representation—supporting zero-computation-cost interpolation, extrapolation, and cross-action transfer. Both methods are fully data-driven and require no manual annotations. Quantitative and qualitative evaluations demonstrate significant improvements in generation fidelity and editing flexibility. The resulting latent spaces are computationally efficient and geometrically consistent, establishing a robust foundation for real-time human modeling and animation.

Technology Category

Application Category

📝 Abstract

This paper introduces self-supervised neural network models to tackle several fundamental problems in the field of 3D human body analysis and processing. First, we propose VariShaPE (Varifold Shape Parameter Estimator), a novel architecture for the retrieval of latent space representations of body shapes and poses. This network offers a fast and robust method to estimate the embedding of arbitrary unregistered meshes into the latent space. Second, we complement the estimation of latent codes with MoGeN (Motion Geometry Network) a framework that learns the geometry on the latent space itself. This is achieved by lifting the body pose parameter space into a higher dimensional Euclidean space in which body motion mini-sequences from a training set of 4D data can be approximated by simple linear interpolation. Using the SMPL latent space representation we illustrate how the combination of these network models, once trained, can be used to perform a variety of tasks with very limited computational cost. This includes operations such as motion interpolation, extrapolation and transfer as well as random shape and pose generation.

Problem

Research questions and friction points this paper is trying to address.

Estimating latent space representations of body shapes and poses

Learning geometry on latent space for motion approximation

Performing motion operations with limited computational cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised neural networks for 3D human body analysis

VariShaPE architecture for latent shape and pose estimation

MoGeN framework for motion geometry via linear interpolation

🔎 Similar Papers

Self-Supervised Skeleton-Based Action Representation Learning: A Benchmark and Beyond