ViPS: Video-informed Pose Spaces for Auto-Rigged Meshes

📅 2026-04-19

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Existing methods for automatic rigging of 3D meshes often fail to model plausible joint pose distributions effectively, leading to anatomically implausible or geometrically self-intersecting poses. This work proposes ViPS, a framework that, for the first time, distills motion priors from pretrained 2D video diffusion models into a general-purpose 3D pose distribution, enabling zero-shot generalization to unseen species and skeletal topologies without relying on scarce 4D data. By integrating a differentiable geometric validator with latent-space pose modeling, ViPS supports effective pose sampling, inverse kinematics projection, and temporally coherent keyframe generation. Experiments demonstrate that ViPS, trained solely on video priors, matches state-of-the-art methods based on synthetic 4D data in both pose plausibility and diversity, while exhibiting superior cross-domain generalization capabilities.

Technology Category

Application Category

📝 Abstract

Kinematic rigs provide a structured interface for articulating 3D meshes, but they lack an inherent representation of the plausible manifold of joint configurations for a given asset. Without such a pose space, stochastic sampling or manual manipulation of raw rig parameters often leads to semantic or geometric violations, such as anatomical hyperextension and non-physical self-intersections. We propose Video-informed Pose Spaces (ViPS), a feed-forward framework that discovers the latent distribution of valid articulations for auto-rigged meshes by distilling motion priors from a pretrained video diffusion model. Unlike existing methods that rely on scarce artist-authored 4D datasets, ViPS transfers generative video priors into a universal distribution over a given rig parameterization. Differentiable geometric validators applied to the skinned mesh enforce asset-specific validity without requiring manual regularizers. Our model learns a smooth, compact, and controllable pose space that supports diverse sampling, manifold projection for inverse kinematics, and temporally coherent trajectories for keyframing. Furthermore, the distilled 3D pose samples serve as precise semantic proxies for guiding video diffusion, effectively closing the loop between generative 2D priors and structured 3D kinematic control. Our evaluations show that ViPS, trained solely on video priors, matches the performance of state-of-the-art methods trained on synthetic artist-created 4D data in both plausibility and diversity. Most importantly, as a universal model, ViPS demonstrates robust zero-shot generalization to out-of-distribution species and unseen skeletal topologies.

Problem

Research questions and friction points this paper is trying to address.

pose space

auto-rigged meshes

kinematic rigs

plausible articulations

geometric validity

Innovation

Methods, ideas, or system contributions that make the work stand out.

pose space

video diffusion

auto-rigging