TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos

πŸ“… 2026-03-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of automatically generating realistic and view-consistent appearances for untextured 3D modelsβ€”a critical task in digital content creation. The authors propose a geometry-conditioned video diffusion approach that leverages multimodal geometric feature encoding to constrain the generation of 360-degree turntable videos. A novel 3D-aware inpainting mechanism is introduced to reconstruct self-occluded regions, ensuring complete surface coverage. By explicitly integrating geometric priors into the video diffusion framework, the method produces high-quality, temporally coherent dynamic previews suitable for direct use in UV texture back-projection or as supervision for neural rendering pipelines. Experimental results demonstrate significant improvements over existing techniques in both view consistency and 3D reconstruction fidelity, enabling fully automated production of ready-to-use 3D assets.

Technology Category

Application Category

πŸ“ Abstract
Automatically generating photorealistic and self-consistent appearances for untextured 3D models is a critical challenge in digital content creation. The advancement of large-scale video generation models offers a natural approach: directly synthesizing 360-degree turntable videos (TTVs), which can serve not only as high-quality dynamic previews but also as an intermediate representation to drive texture synthesis and neural rendering. However, existing general-purpose video diffusion models struggle to maintain strict geometric consistency and appearance stability across the full range of views, making their outputs ill-suited for high-quality 3D reconstruction. To this end, we introduce TAPESTRY, a framework for generating high-fidelity TTVs conditioned on explicit 3D geometry. We reframe the 3D appearance generation task as a geometry-conditioned video diffusion problem: given a 3D mesh, we first render and encode multi-modal geometric features to constrain the video generation process with pixel-level precision, thereby enabling the creation of high-quality and consistent TTVs. Building upon this, we also design a method for downstream reconstruction tasks from the TTV input, featuring a multi-stage pipeline with 3D-Aware Inpainting. By rotating the model and performing a context-aware secondary generation, this pipeline effectively completes self-occluded regions to achieve full surface coverage. The videos generated by TAPESTRY are not only high-quality dynamic previews but also serve as a reliable, 3D-aware intermediate representation that can be seamlessly back-projected into UV textures or used to supervise neural rendering methods like 3DGS. This enables the automated creation of production-ready, complete 3D assets from untextured meshes. Experimental results demonstrate that our method outperforms existing approaches in both video consistency and final reconstruction quality.
Problem

Research questions and friction points this paper is trying to address.

3D appearance generation
geometric consistency
turntable videos
texture synthesis
neural rendering
Innovation

Methods, ideas, or system contributions that make the work stand out.

geometry-conditioned video diffusion
turntable video generation
3D-aware inpainting
consistent appearance synthesis
neural rendering supervision
πŸ”Ž Similar Papers
No similar papers found.
Y
Yan Zeng
ShanghaiTech University; Deemos Technology
H
Haoran Jiang
ShanghaiTech University; Deemos Technology
K
Kaixin Yao
ShanghaiTech University; Deemos Technology
Q
Qixuan Zhang
ShanghaiTech University; Deemos Technology
L
Longwen Zhang
ShanghaiTech University; Deemos Technology
L
Lan Xu
ShanghaiTech University
Jingyi Yu
Jingyi Yu
Professor, ShanghaiTech University
Computer VisionComputer Graphics