Rigel3D: Rig-aware Latents for Animation-Ready 3D Asset Generation

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
Existing 3D generative models struggle to produce assets that are directly animatable due to the absence of essential rigging components such as skeletal structures, joint hierarchies, and skinning weights. This work proposes an implicit representation that jointly models geometry and skeleton structure, enabling the conditional generation—driven by input images—of meshes, skeleton topologies, joint locations, and skinning weights in a unified framework. Key innovations include the first approach to co-generate geometry and rigging during synthesis, an open-vocabulary joint labeling module for semantic alignment with arbitrary retargeting templates, and the integration of a rig-aware autoencoder, a two-stage latent generative model, and vision-language embeddings. Experiments on a large-scale rigged dataset demonstrate that the method generates diverse, high-quality, and immediately animatable 3D assets, outperforming existing rigging approaches across multiple metrics.
📝 Abstract
Recent 3D generative models can synthesize high-quality assets, but their outputs are typically static: they lack the skeletal rigs, joint hierarchies, and skinning weights required for animation. This limits their use in games, film, simulation, virtual agents, and embodied AI, where assets must not only look plausible but also move plausibly. We introduce Rigel3D, a generative method for animation-ready 3D assets represented as rigged meshes. Unlike post-hoc auto-rigging methods that attach rigs to completed shapes, our method jointly models geometry and rig structure through coupled surface and skeleton structured latent representations. A rig-aware autoencoder decodes these representations into mesh geometry, skeleton topology, joint coordinates, and skinning weights, while a two-stage latent generative model synthesizes both surface and skeleton representations for image-conditioned generation. To support downstream animation workflows, we further introduce an open-vocabulary joint labeling module that embeds generated joints into a shared vision-language space, enabling correspondence to arbitrary retargeting templates. Experiments on large-scale rigged asset datasets demonstrate that our method generates diverse, high-quality animation-ready assets and outperforms existing rigging baselines across multiple metrics.
Problem

Research questions and friction points this paper is trying to address.

animation-ready 3D assets
skeletal rigs
skinning weights
joint hierarchies
3D generative models
Innovation

Methods, ideas, or system contributions that make the work stand out.

rig-aware generation
animation-ready 3D assets
structured latent representation
joint skinning
vision-language joint labeling