Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation

๐Ÿ“… 2025-02-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of high-fidelity 3D asset generation from diverse inputsโ€”single images, multi-view images, or text. Methodologically, it introduces an implicit geometric VAE-diffusion joint modeling module for robust shape synthesis, and a multi-stage consistent texture synthesis module integrating RGB-to-PBR conversion with a novel consistency scheduling mechanism; additionally, a lightweight Artist-Created Mesh path is incorporated to accelerate inference. Key contributions include: (i) the first multi-stage texture consistency scheduling strategy, (ii) synergistic enhancement of geometric representation via VAE-diffusion co-training, and (iii) end-to-end generation of high-resolution, PBR-enabled 3D assets. Extensive experiments demonstrate significant improvements over state-of-the-art methods in geometric completeness, texture seamlessness, and rendering photorealism.

Technology Category

Application Category

๐Ÿ“ Abstract
This report presents a comprehensive framework for generating high-quality 3D shapes and textures from diverse input prompts, including single images, multi-view images, and text descriptions. The framework consists of 3D shape generation and texture generation. (1). The 3D shape generation pipeline employs a Variational Autoencoder (VAE) to encode implicit 3D geometries into a latent space and a diffusion network to generate latents conditioned on input prompts, with modifications to enhance model capacity. An alternative Artist-Created Mesh (AM) generation approach is also explored, yielding promising results for simpler geometries. (2). Texture generation involves a multi-stage process starting with frontal images generation followed by multi-view images generation, RGB-to-PBR texture conversion, and high-resolution multi-view texture refinement. A consistency scheduler is plugged into every stage, to enforce pixel-wise consistency among multi-view textures during inference, ensuring seamless integration. The pipeline demonstrates effective handling of diverse input formats, leveraging advanced neural architectures and novel methodologies to produce high-quality 3D content. This report details the system architecture, experimental results, and potential future directions to improve and expand the framework. The source code and pretrained weights are released at: url{https://github.com/Tencent/Tencent-XR-3DGen}.
Problem

Research questions and friction points this paper is trying to address.

High-quality 3D shape generation
Texture synthesis from diverse inputs
Seamless multi-view texture integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

VAE and diffusion network for 3D shapes
Multi-stage texture generation with consistency scheduler
High-resolution multi-view texture refinement
๐Ÿ”Ž Similar Papers
No similar papers found.
Jiayu Yang
Jiayu Yang
The Australian National University
3D Computer Vision3D AIGC3D ReconstructionMulti-view StereoVR AR XR
T
Taizhang Shang
Tencent XR Vision Labs
Weixuan Sun
Weixuan Sun
Tencent | PhD ANU
computer visionmachine learningnatural Language Processing
X
Xibin Song
Tencent XR Vision Labs
Z
Ziang Chen
Tencent XR Vision Labs
S
Senbo Wang
Tencent XR Vision Labs
S
Shenzhou Chen
Tencent XR Vision Labs
Weizhe Liu
Weizhe Liu
ByteDance
Computer VisionMachine LearningRobotics
H
Hongdong Li
The Australian National University
Pan Ji
Pan Ji
Ph.D., ex Tencent XR Vision Labs
Computer visionmachine learning3D visionGraphics