AniGen: Unified $S^3$ Fields for Animatable 3D Asset Generation

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing 3D generation methods struggle to directly produce animatable assets with valid skeletons and skinning weights, often resulting in topological errors or invalid rigs during post-hoc rigging. This work proposes AniGen, the first unified framework that generates animatable 3D assets directly from a single image by jointly modeling shape, skeleton, and skinning as mutually consistent $S^3$ fields over a shared spatial domain. AniGen employs a two-stage flow-matching strategy: first generating a sparse skeletal structure, then synthesizing dense geometry and binding information. To address geometric ambiguity in bone prediction, it introduces a confidence-decayed skeleton field, and designs dual skinning feature fields to decouple skinning weights from joint count, enabling generation of skeletons with arbitrary complexity. Experiments demonstrate that AniGen significantly outperforms existing methods in rig validity and animation quality, and generalizes robustly across diverse in-the-wild images of humans, animals, and mechanical objects.

Technology Category

Application Category

📝 Abstract

Animatable 3D assets, defined as geometry equipped with an articulated skeleton and skinning weights, are fundamental to interactive graphics, embodied agents, and animation production. While recent 3D generative models can synthesize visually plausible shapes from images, the results are typically static. Obtaining usable rigs via post-hoc auto-rigging is brittle and often produces skeletons that are topologically inconsistent with the generated geometry. We present AniGen, a unified framework that directly generates animate-ready 3D assets conditioned on a single image. Our key insight is to represent shape, skeleton, and skinning as mutually consistent $S^3$ Fields (Shape, Skeleton, Skin) defined over a shared spatial domain. To enable the robust learning of these fields, we introduce two technical innovations: (i) a confidence-decaying skeleton field that explicitly handles the geometric ambiguity of bone prediction at Voronoi boundaries, and (ii) a dual skin feature field that decouples skinning weights from specific joint counts, allowing a fixed-architecture network to predict rigs of arbitrary complexity. Built upon a two-stage flow-matching pipeline, AniGen first synthesizes a sparse structural scaffold and then generates dense geometry and articulation in a structured latent space. Extensive experiments demonstrate that AniGen substantially outperforms state-of-the-art sequential baselines in rig validity and animation quality, generalizing effectively to in-the-wild images across diverse categories including animals, humanoids, and machinery. Homepage: https://yihua7.github.io/AniGen-web/

Problem

Research questions and friction points this paper is trying to address.

Animatable 3D assets

Auto-rigging

Skeleton-geometry inconsistency

3D generative models

Skinning weights

Innovation

Methods, ideas, or system contributions that make the work stand out.

S^3 Fields

animatable 3D generation

skeleton prediction