🤖 AI Summary
Existing parametric human body models, such as SMPL and SMPL-X, are difficult to unify due to incompatibilities in mesh topology, skeletal structure, and shape parameterization. This work proposes a three-layer abstraction—mesh topology, skeleton, and pose—to construct a unified human representation that enables seamless conversion across models. By leveraging constant-time vertex mapping, closed-form joint transformation recovery, and inverse skinning for pose extraction, the method reduces the complexity of many-to-many model adaptation from O(M²) to O(M). Integrated within a fully differentiable architecture and accelerated via NVIDIA Warp on GPUs, the approach requires neither iterative optimization nor model-specific training. It supports arbitrary mixing of identity and motion from compatible models at inference time, significantly enhancing flexibility and efficiency in human reconstruction, animation, and simulation.
📝 Abstract
Parametric human body models are foundational to human reconstruction, animation, and simulation, yet they remain mutually incompatible: SMPL, SMPL-X, MHR, Anny, and related models each diverge in mesh topology, skeletal structure, shape parameterization, and unit convention, making it impractical to exploit their complementary strengths within a single pipeline. We present SOMA, a unified body layer that bridges these heterogeneous representations through three abstraction layers. Mesh topology abstraction maps any source model's identity to a shared canonical mesh in constant time per vertex. Skeletal abstraction recovers a full set of identity-adapted joint transforms from any body shape, whether in rest pose or an arbitrary posed configuration, in a single closed-form pass, with no iterative optimization or per-model training. Pose abstraction inverts the skinning pipeline to recover unified skeleton rotations directly from posed vertices of any supported model, enabling heterogeneous motion datasets to be consumed without custom retargeting. Together, these layers reduce the $O(M^2)$ per-pair adapter problem to $O(M)$ single-backend connectors, letting practitioners freely mix identity sources and pose data at inference time. The entire pipeline is fully differentiable end-to-end and GPU-accelerated via NVIDIA-Warp.