Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation

📅 2025-05-07

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Existing text-to-3D methods suffer from semantic misinterpretation and occluded-part misbinding when generating objects with complex attributes, primarily due to weak long-text encoding capacity and unstructured part generation. This work proposes a hierarchical chained generation framework: (1) an LLM automatically decomposes lengthy input prompts into semantically coherent part descriptions; (2) cross-modal target region localization determines the spatial ordering of parts; and (3) leveraging 3D Gaussian splatting, it dynamically expands Gaussian kernels and introduces a semantic-label ablation mechanism for part-level decoupled optimization. To our knowledge, this is the first approach to deeply integrate LLM-driven textual structural parsing with differentiable 3D generation—ensuring structural consistency and attribute fidelity without manual intervention. Experiments demonstrate significant improvements in geometric coherence and attribute accuracy under complex scenes, outperforming state-of-the-art methods both qualitatively and quantitatively. The code is publicly available.

Technology Category

Application Category

📝 Abstract

Recent text-to-3D models can render high-quality assets, yet they still stumble on objects with complex attributes. The key obstacles are: (1) existing text-to-3D approaches typically lift text-to-image models to extract semantics via text encoders, while the text encoder exhibits limited comprehension ability for long descriptions, leading to deviated cross-attention focus, subsequently wrong attribute binding in generated results. (2) Occluded object parts demand a disciplined generation order and explicit part disentanglement. Though some works introduce manual efforts to alleviate the above issues, their quality is unstable and highly reliant on manual information. To tackle above problems, we propose a automated method Hierarchical-Chain-of-Generation (HCoG). It leverages a large language model to decompose the long description into blocks representing different object parts, and orders them from inside out according to occlusions, forming a hierarchical chain. Within each block we first coarsely create components, then precisely bind attributes via target-region localization and corresponding 3D Gaussian kernel optimization. Between blocks, we introduce Gaussian Extension and Label Elimination to seamlessly generate new parts by extending new Gaussian kernels, re-assigning semantic labels, and eliminating unnecessary kernels, ensuring that only relevant parts are added without disrupting previously optimized parts. Experiments confirm that HCoG yields structurally coherent, attribute-faithful 3D objects with complex attributes. The code is available at https://github.com/Wakals/GASCOL .

Problem

Research questions and friction points this paper is trying to address.

Improving text-to-3D generation for objects with complex attributes

Addressing limited text encoder comprehension for long descriptions

Automating part disentanglement and disciplined generation order

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical-Chain-of-Generation for complex 3D objects

Decompose descriptions via large language model

Gaussian Extension and Label Elimination technique

🔎 Similar Papers

Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph