🤖 AI Summary
Generating high-fidelity, geometrically precise, and photorealistic 3D assets from single images remains challenging. To address this, we propose a two-stage diffusion framework: first, a 10B-parameter LATTICE shape foundation model generates accurate, clean mesh geometry; second, physically based rendering (PBR) and multi-view consistency modeling synthesize realistic, view-coherent textures. Our method pioneers the integration of scalable, large-scale geometric priors with physics-driven texture synthesis, overcoming the dual bottlenecks of geometric fidelity and material realism inherent in conventional image-to-3D pipelines. Evaluated on ShapeNet and a newly curated high-quality dataset, our approach achieves state-of-the-art performance across mesh smoothness, shape sharpness, and end-to-end texture photorealism—demonstrating, for the first time, precise alignment from monocular image input to production-ready, high-fidelity 3D assets.
📝 Abstract
In this report, we present Hunyuan3D 2.5, a robust suite of 3D diffusion models aimed at generating high-fidelity and detailed textured 3D assets. Hunyuan3D 2.5 follows two-stages pipeline of its previous version Hunyuan3D 2.0, while demonstrating substantial advancements in both shape and texture generation. In terms of shape generation, we introduce a new shape foundation model -- LATTICE, which is trained with scaled high-quality datasets, model-size, and compute. Our largest model reaches 10B parameters and generates sharp and detailed 3D shape with precise image-3D following while keeping mesh surface clean and smooth, significantly closing the gap between generated and handcrafted 3D shapes. In terms of texture generation, it is upgraded with phyiscal-based rendering (PBR) via a novel multi-view architecture extended from Hunyuan3D 2.0 Paint model. Our extensive evaluation shows that Hunyuan3D 2.5 significantly outperforms previous methods in both shape and end-to-end texture generation.