🤖 AI Summary
To address three core challenges in 3D texture generation—reference texture misalignment, geometry-texture inconsistency, and local detail degradation—this paper proposes MVPainter, a multi-view diffusion framework. Methodologically, it introduces the first geometry-conditioned multi-view diffusion paradigm, integrating ControlNet-based geometric guidance, physics-based rendering (PBR) attribute inversion, UV-baking optimization, and targeted data augmentation. We formally define and quantify three evaluation dimensions: texture alignment fidelity, geometry-texture consistency, and local detail quality—and achieve, for the first time, end-to-end high-fidelity PBR material mesh generation. Experiments demonstrate state-of-the-art performance across all three metrics; human evaluation further confirms significant superiority over existing methods. The codebase, pre-trained models, and evaluation pipeline are fully open-sourced.
📝 Abstract
Recently, significant advances have been made in 3D object generation. Building upon the generated geometry, current pipelines typically employ image diffusion models to generate multi-view RGB images, followed by UV texture reconstruction through texture baking. While 3D geometry generation has improved significantly, supported by multiple open-source frameworks, 3D texture generation remains underexplored. In this work, we systematically investigate 3D texture generation through the lens of three core dimensions: reference-texture alignment, geometry-texture consistency, and local texture quality. To tackle these issues, we propose MVPainter, which employs data filtering and augmentation strategies to enhance texture fidelity and detail, and introduces ControlNet-based geometric conditioning to improve texture-geometry alignment. Furthermore, we extract physically-based rendering (PBR) attributes from the generated views to produce PBR meshes suitable for real-world rendering applications. MVPainter achieves state-of-the-art results across all three dimensions, as demonstrated by human-aligned evaluations. To facilitate further research and reproducibility, we also release our full pipeline as an open-source system, including data construction, model architecture, and evaluation tools.