StdGEN++: A Comprehensive System for Semantic-Decomposed 3D Character Generation

📅 2026-01-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D generation methods typically produce monolithic meshes, which lack the structural flexibility and semantic decomposition required by the gaming and animation industries. To address this limitation, this work proposes a Dual-Branch Semantic-Aware Large Reconstruction Model (Dual-Branch S-LRM) that jointly reconstructs high-fidelity geometry, color, and part-level semantics in an end-to-end manner through a hybrid implicit field representation and a coarse-to-fine sampling strategy. Furthermore, a video diffusion–based texture disentanglement module is introduced to separate semantically meaningful layers such as irises and skin. The proposed method significantly outperforms existing approaches in both geometric accuracy and semantic disentanglement, enabling non-destructive editing, physics-compatible animation, and gaze tracking—thus offering a practical solution for automated character asset production.

Technology Category

Application Category

📝 Abstract
We present StdGEN++, a novel and comprehensive system for generating high-fidelity, semantically decomposed 3D characters from diverse inputs. Existing 3D generative methods often produce monolithic meshes that lack the structural flexibility required by industrial pipelines in gaming and animation. Addressing this gap, StdGEN++ is built upon a Dual-branch Semantic-aware Large Reconstruction Model (Dual-Branch S-LRM), which jointly reconstructs geometry, color, and per-component semantics in a feed-forward manner. To achieve production-level fidelity, we introduce a novel semantic surface extraction formalism compatible with hybrid implicit fields. This mechanism is accelerated by a coarse-to-fine proposal scheme, which significantly reduces memory footprint and enables high-resolution mesh generation. Furthermore, we propose a video-diffusion-based texture decomposition module that disentangles appearance into editable layers (e.g., separated iris and skin), resolving semantic confusion in facial regions. Experiments demonstrate that StdGEN++ achieves state-of-the-art performance, significantly outperforming existing methods in geometric accuracy and semantic disentanglement. Crucially, the resulting structural independence unlocks advanced downstream capabilities, including non-destructive editing, physics-compliant animation, and gaze tracking, making it a robust solution for automated character asset production.
Problem

Research questions and friction points this paper is trying to address.

3D character generation
semantic decomposition
structural flexibility
industrial pipelines
editable assets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-Decomposed 3D Generation
Dual-Branch S-LRM
Hybrid Implicit Fields
Texture Disentanglement
Coarse-to-Fine Proposal
🔎 Similar Papers
No similar papers found.