3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

📅 2024-09-19

🏛️ arXiv.org

📈 Citations: 6

✨ Influential: 1

career value

198K/year

🤖 AI Summary

Current 3D generative models suffer from slow optimization, low geometric fidelity, and lack of native Physically Based Rendering (PBR) material support. To address these limitations, we propose PrimX—a differentiable primitive-based native 3D representation that jointly encodes high-resolution geometry, albedo, and PBR material fields. We further design a two-stage generative framework: the first stage employs a Diffusion Transformer to compress primitive blocks into a structured latent space; the second stage performs PBR-aware latent diffusion within this structured primitive latent space. Our method enables end-to-end text- or image-conditioned 3D generation and directly outputs industrial-grade, PBR-ready assets without post-processing. Experiments demonstrate that PrimX consistently surpasses state-of-the-art methods across geometric accuracy, texture detail, and material realism. Quantitative evaluations show significant improvements across multiple metrics, effectively bridging the quality gap between generative 3D modeling and practical physical rendering applications.

Technology Category

Application Category

📝 Abstract

The increasing demand for high-quality 3D assets across various industries necessitates efficient and automated 3D content creation. Despite recent advancements in 3D generative models, existing methods still face challenges with optimization speed, geometric fidelity, and the lack of assets for physically based rendering (PBR). In this paper, we introduce 3DTopia-XL, a scalable native 3D generative model designed to overcome these limitations. 3DTopia-XL leverages a novel primitive-based 3D representation, PrimX, which encodes detailed shape, albedo, and material field into a compact tensorial format, facilitating the modeling of high-resolution geometry with PBR assets. On top of the novel representation, we propose a generative framework based on Diffusion Transformer (DiT), which comprises 1) Primitive Patch Compression, 2) and Latent Primitive Diffusion. 3DTopia-XL learns to generate high-quality 3D assets from textual or visual inputs. We conduct extensive qualitative and quantitative experiments to demonstrate that 3DTopia-XL significantly outperforms existing methods in generating high-quality 3D assets with fine-grained textures and materials, efficiently bridging the quality gap between generative models and real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Efficient and automated 3D content creation

Overcoming challenges in optimization speed and geometric fidelity

Generating high-quality 3D assets with fine-grained textures and materials

Innovation

Methods, ideas, or system contributions that make the work stand out.

PrimX: compact tensorial 3D representation

Diffusion Transformer for generative framework

Generates 3D assets from text or visuals

🔎 Similar Papers

No similar papers found.