CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation

📅 2025-01-16

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing 3D generation methods suffer from multi-view inconsistency, coarse surface reconstruction, texture distortion, and high computational cost. To address these issues, we propose Carve-and-Paint, a two-stage framework: (1) a multi-view geometry-guided 3D latent diffusion model generates structurally consistent low-resolution meshes; (2) a model-agnostic spatially decoupled attention mechanism enables high-fidelity 4K texture synthesis, augmented by a 3D-aware occlusion inpainting module to reconstruct unobserved regions. Our key contributions are: (i) a geometry-texture decoupled two-stage generation paradigm; (ii) a novel spatially decoupled attention mechanism; and (iii) the first 3D-aware occlusion inpainting algorithm specifically designed for texture synthesis. The framework generates production-grade 4K textured meshes in under 30 seconds, significantly improving multi-view consistency, surface completeness, and texture fidelity.

Technology Category

Application Category

📝 Abstract

The synthesis of high-quality 3D assets from textual or visual inputs has become a central objective in modern generative modeling. Despite the proliferation of 3D generation algorithms, they frequently grapple with challenges such as multi-view inconsistency, slow generation times, low fidelity, and surface reconstruction problems. While some studies have addressed some of these issues, a comprehensive solution remains elusive. In this paper, we introduce extbf{CaPa}, a carve-and-paint framework that generates high-fidelity 3D assets efficiently. CaPa employs a two-stage process, decoupling geometry generation from texture synthesis. Initially, a 3D latent diffusion model generates geometry guided by multi-view inputs, ensuring structural consistency across perspectives. Subsequently, leveraging a novel, model-agnostic Spatially Decoupled Attention, the framework synthesizes high-resolution textures (up to 4K) for a given geometry. Furthermore, we propose a 3D-aware occlusion inpainting algorithm that fills untextured regions, resulting in cohesive results across the entire model. This pipeline generates high-quality 3D assets in less than 30 seconds, providing ready-to-use outputs for commercial applications. Experimental results demonstrate that CaPa excels in both texture fidelity and geometric stability, establishing a new standard for practical, scalable 3D asset generation.

Problem

Research questions and friction points this paper is trying to address.

3D modeling

view inconsistency

surface reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

CaPa method

Ultra-high definition texture

Rapid 3D modeling

🔎 Similar Papers

No similar papers found.