🤖 AI Summary
This work addresses the lack of physical plausibility and severe visual artifacts in text-to-texture synthesis. Existing approaches—e.g., score distillation sampling—rely on implicit texture parameterizations, which inherently introduce artifacts and necessitate strong regularization. To overcome these limitations, we propose CasTex, a cascaded diffusion model that abandons implicit modeling in favor of explicit texture map representation. CasTex integrates differentiable rasterization with a physically based rendering (PBR) pipeline for end-to-end optimization. Its cascaded architecture first generates a coarse base texture and then refines it with high-frequency details, balancing global coherence and local fidelity. On public benchmarks, CasTex significantly outperforms prior methods: it eliminates artifacts without auxiliary regularization and produces textures exhibiting lighting consistency, geometry awareness, and high-detail fidelity. This establishes a new paradigm for text-driven, physically grounded texture synthesis.
📝 Abstract
This work investigates text-to-texture synthesis using diffusion models to generate physically-based texture maps. We aim to achieve realistic model appearances under varying lighting conditions. A prominent solution for the task is score distillation sampling. It allows recovering a complex texture using gradient guidance given a differentiable rasterization and shading pipeline. However, in practice, the aforementioned solution in conjunction with the widespread latent diffusion models produces severe visual artifacts and requires additional regularization such as implicit texture parameterization. As a more direct alternative, we propose an approach using cascaded diffusion models for texture synthesis (CasTex). In our setup, score distillation sampling yields high-quality textures out-of-the box. In particular, we were able to omit implicit texture parameterization in favor of an explicit parameterization to improve the procedure. In the experiments, we show that our approach significantly outperforms state-of-the-art optimization-based solutions on public texture synthesis benchmarks.