Text-Guided Texturing by Synchronized Multi-View Diffusion

📅 2023-11-21

🏛️ ACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia

📈 Citations: 35

✨ Influential: 12

career value

182K/year

🤖 AI Summary

Existing 3D text-to-texture generation methods suffer from inter-view texture inconsistency due to asynchronous multi-view diffusion and insufficient cross-view information sharing. To address this, we propose a synchronized multi-view diffusion mechanism that performs cross-view weighted fusion of texture-domain latent representations over overlapping regions at each denoising step, integrated with geometric view alignment and synchronized feature updates—ensuring geometric consistency and texture seamlessness from the generation source. Our approach leverages a pre-trained text-to-image diffusion model and introduces, for the first time, a latent-space texture-domain cross-view content consensus paradigm. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches in both qualitative and quantitative evaluations, generating high-fidelity, geometrically faithful textures with strong inter-view consistency, while remaining compatible with diverse textual prompts.

📝 Abstract

This paper introduces a novel approach to synthesize texture to dress up a given 3D object, given a text prompt. Based on the pretrained text-to-image (T2I) diffusion model, existing methods usually employ a project-and-inpaint approach, in which a view of the given object is first generated and warped to another view for inpainting. But it tends to generate inconsistent texture due to the asynchronous diffusion of multiple views. We believe such asynchronous diffusion and insufficient information sharing among views are the root causes of the inconsistent artifact. In this paper, we propose a synchronized multi-view diffusion approach that allows the diffusion processes from different views to reach a consensus of the generated content early in the process, and hence ensures the texture consistency. To synchronize the diffusion, we share the denoised content among different views in each denoising step, specifically blending the latent content in the texture domain from views with overlap. Our method demonstrates superior performance in generating consistent, seamless, highly detailed textures, comparing to state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

Generates consistent 3D object textures from text prompts

Addresses asynchronous diffusion in multi-view texture synthesis

Ensures seamless texture by synchronizing multi-view diffusion processes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synchronized multi-view diffusion for texture consistency

Shared denoised content across overlapping views

Latent content blending in texture domain

🔎 Similar Papers

No similar papers found.