🤖 AI Summary
Existing 3D texture synthesis methods suffer from single-view input constraints and inadequate geometric understanding, leading to inconsistent inter-patch seams, semantic distortion in occluded regions, and temporal flickering under dynamic scenes. To address these limitations, we propose the first video-generation-based framework for 3D texture synthesis, innovatively integrating geometry-aware diffusion with structural UV-space modeling. Our method employs mesh-structured conditioning to enforce geometric awareness during generation and introduces a spatiotemporal diffusion strategy explicitly defined over the UV parameterization—thereby modeling both inter-patch topological relationships and inter-frame temporal dependencies. Experiments demonstrate significant improvements over state-of-the-art approaches in three key metrics: texture fidelity, seam coherence, and temporal stability. The framework enables high-fidelity, dynamically consistent, and real-time 3D content generation.
📝 Abstract
Current texture synthesis methods, which generate textures from fixed viewpoints, suffer from inconsistencies due to the lack of global context and geometric understanding. Meanwhile, recent advancements in video generation models have demonstrated remarkable success in achieving temporally consistent videos. In this paper, we introduce VideoTex, a novel framework for seamless texture synthesis that leverages video generation models to address both spatial and temporal inconsistencies in 3D textures. Our approach incorporates geometry-aware conditions, enabling precise utilization of 3D mesh structures. Additionally, we propose a structure-wise UV diffusion strategy, which enhances the generation of occluded areas by preserving semantic information, resulting in smoother and more coherent textures. VideoTex not only achieves smoother transitions across UV boundaries but also ensures high-quality, temporally stable textures across video frames. Extensive experiments demonstrate that VideoTex outperforms existing methods in texture fidelity, seam blending, and stability, paving the way for dynamic real-time applications that demand both visual quality and temporal coherence.