๐ค AI Summary
Existing 3D text-to-texture generation methods suffer from inter-view texture inconsistency due to asynchronous multi-view diffusion and insufficient cross-view information sharing. To address this, we propose a synchronized multi-view diffusion mechanism that performs cross-view weighted fusion of texture-domain latent representations over overlapping regions at each denoising step, integrated with geometric view alignment and synchronized feature updatesโensuring geometric consistency and texture seamlessness from the generation source. Our approach leverages a pre-trained text-to-image diffusion model and introduces, for the first time, a latent-space texture-domain cross-view content consensus paradigm. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches in both qualitative and quantitative evaluations, generating high-fidelity, geometrically faithful textures with strong inter-view consistency, while remaining compatible with diverse textual prompts.
๐ Abstract
This paper introduces a novel approach to synthesize texture to dress up a given 3D object, given a text prompt. Based on the pretrained text-to-image (T2I) diffusion model, existing methods usually employ a project-and-inpaint approach, in which a view of the given object is first generated and warped to another view for inpainting. But it tends to generate inconsistent texture due to the asynchronous diffusion of multiple views. We believe such asynchronous diffusion and insufficient information sharing among views are the root causes of the inconsistent artifact. In this paper, we propose a synchronized multi-view diffusion approach that allows the diffusion processes from different views to reach a consensus of the generated content early in the process, and hence ensures the texture consistency. To synchronize the diffusion, we share the denoised content among different views in each denoising step, specifically blending the latent content in the texture domain from views with overlap. Our method demonstrates superior performance in generating consistent, seamless, highly detailed textures, comparing to state-of-the-art methods.