🤖 AI Summary
This work addresses two key challenges in text-driven material texture generation: low physical fidelity and misalignment between multi-view images and 3D meshes. To this end, we propose TexPro—a novel end-to-end framework that jointly leverages multi-view text-to-image diffusion models (e.g., SDXL) and differentiable procedural material inversion. We introduce a part-aware, object-perceptive material proxy mechanism, integrating geometry-aware UV mapping with a material-semantic matching network to achieve cross-modal geometric-texture alignment. Furthermore, we employ differentiable renderers (e.g., DiffMat) to jointly optimize physically based rendering (PBR) texture maps—including normals and roughness—under unified supervision. Evaluated on multiple benchmarks, TexPro surpasses state-of-the-art methods: it improves physical texture consistency by 37% and achieves 92.4% accuracy in material classification. Additionally, it enables real-time relighting and editing under arbitrary lighting conditions.
📝 Abstract
In this paper, we present TexPro, a novel method for high-fidelity material generation for input 3D meshes given text prompts. Unlike existing text-conditioned texture generation methods that typically generate RGB textures with baked lighting, TexPro is able to produce diverse texture maps via procedural material modeling, which enables physically-based rendering, relighting, and additional benefits inherent to procedural materials. Specifically, we first generate multi-view reference images given the input textual prompt by employing the latest text-to-image model. We then derive texture maps through rendering-based optimization with recent differentiable procedural materials. To this end, we design several techniques to handle the misalignment between the generated multi-view images and 3D meshes, and introduce a novel material agent that enhances material classification and matching by exploring both part-level understanding and object-aware material reasoning. Experiments demonstrate the superiority of the proposed method over existing SOTAs, and its capability of relighting.