AniGen: Unified $S^3$ Fields for Animatable 3D Asset Generation

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
Existing 3D generation methods struggle to directly produce animatable assets with valid skeletons and skinning weights, often resulting in topological errors or invalid rigs during post-hoc rigging. This work proposes AniGen, the first unified framework that generates animatable 3D assets directly from a single image by jointly modeling shape, skeleton, and skinning as mutually consistent $S^3$ fields over a shared spatial domain. AniGen employs a two-stage flow-matching strategy: first generating a sparse skeletal structure, then synthesizing dense geometry and binding information. To address geometric ambiguity in bone prediction, it introduces a confidence-decayed skeleton field, and designs dual skinning feature fields to decouple skinning weights from joint count, enabling generation of skeletons with arbitrary complexity. Experiments demonstrate that AniGen significantly outperforms existing methods in rig validity and animation quality, and generalizes robustly across diverse in-the-wild images of humans, animals, and mechanical objects.

Technology Category

Application Category

📝 Abstract
Animatable 3D assets, defined as geometry equipped with an articulated skeleton and skinning weights, are fundamental to interactive graphics, embodied agents, and animation production. While recent 3D generative models can synthesize visually plausible shapes from images, the results are typically static. Obtaining usable rigs via post-hoc auto-rigging is brittle and often produces skeletons that are topologically inconsistent with the generated geometry. We present AniGen, a unified framework that directly generates animate-ready 3D assets conditioned on a single image. Our key insight is to represent shape, skeleton, and skinning as mutually consistent $S^3$ Fields (Shape, Skeleton, Skin) defined over a shared spatial domain. To enable the robust learning of these fields, we introduce two technical innovations: (i) a confidence-decaying skeleton field that explicitly handles the geometric ambiguity of bone prediction at Voronoi boundaries, and (ii) a dual skin feature field that decouples skinning weights from specific joint counts, allowing a fixed-architecture network to predict rigs of arbitrary complexity. Built upon a two-stage flow-matching pipeline, AniGen first synthesizes a sparse structural scaffold and then generates dense geometry and articulation in a structured latent space. Extensive experiments demonstrate that AniGen substantially outperforms state-of-the-art sequential baselines in rig validity and animation quality, generalizing effectively to in-the-wild images across diverse categories including animals, humanoids, and machinery. Homepage: https://yihua7.github.io/AniGen-web/
Problem

Research questions and friction points this paper is trying to address.

Animatable 3D assets
Auto-rigging
Skeleton-geometry inconsistency
3D generative models
Skinning weights
Innovation

Methods, ideas, or system contributions that make the work stand out.

S^3 Fields
animatable 3D generation
skeleton prediction
skinning weight decoupling
flow-matching