RomanTex: Decoupling 3D-aware Rotary Positional Embedded Multi-Attention Network for Texture Synthesis

📅 2025-03-24

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

To address seam artifacts and ghosting caused by multi-view inconsistency in 3D texture generation, this paper proposes the first end-to-end UV texture synthesis framework that tightly integrates 3D geometric priors with 2D diffusion model capabilities. Methodologically: (1) we introduce a 3D-aware rotational positional encoding to explicitly model viewpoint-geometry relationships; (2) we design a decoupled multi-attention module to enhance semantic consistency on occluded backfaces; and (3) we propose a geometry-correlated classifier-free guidance strategy to improve texture-geometry alignment. Experiments demonstrate state-of-the-art performance in both texture fidelity and cross-view consistency, effectively suppressing seams and ghosting artifacts. Quantitative evaluations—including PSNR, LPIPS, and geometry-aware metrics—alongside user studies confirm significant improvements over prior methods in visual quality, geometric coherence, and perceptual realism.

Technology Category

Application Category

📝 Abstract

Painting textures for existing geometries is a critical yet labor-intensive process in 3D asset generation. Recent advancements in text-to-image (T2I) models have led to significant progress in texture generation. Most existing research approaches this task by first generating images in 2D spaces using image diffusion models, followed by a texture baking process to achieve UV texture. However, these methods often struggle to produce high-quality textures due to inconsistencies among the generated multi-view images, resulting in seams and ghosting artifacts. In contrast, 3D-based texture synthesis methods aim to address these inconsistencies, but they often neglect 2D diffusion model priors, making them challenging to apply to real-world objects To overcome these limitations, we propose RomanTex, a multiview-based texture generation framework that integrates a multi-attention network with an underlying 3D representation, facilitated by our novel 3D-aware Rotary Positional Embedding. Additionally, we incorporate a decoupling characteristic in the multi-attention block to enhance the model's robustness in image-to-texture task, enabling semantically-correct back-view synthesis. Furthermore, we introduce a geometry-related Classifier-Free Guidance (CFG) mechanism to further improve the alignment with both geometries and images. Quantitative and qualitative evaluations, along with comprehensive user studies, demonstrate that our method achieves state-of-the-art results in texture quality and consistency.

Problem

Research questions and friction points this paper is trying to address.

Addressing inconsistencies in multi-view image generation for textures

Integrating 2D diffusion priors with 3D texture synthesis

Reducing seams and ghosting artifacts in UV texture baking

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D-aware Rotary Positional Embedding enhances texture synthesis

Decoupling multi-attention block improves back-view synthesis

Geometry-related CFG aligns textures with geometries and images

🔎 Similar Papers

No similar papers found.