π€ AI Summary
Existing 3D texture generation methods suffer from inter-view inconsistency and incomplete coverage of complex surfaces due to multi-view fusion, compromising texture fidelity and geometric integrity. To address this, we propose TEXTRIXβa novel framework that introduces the first implicit 3D attribute grid, unifying texture synthesis and semantic segmentation directly in native voxel space and thereby eliminating the need for explicit view alignment. We further design a sparse-attention diffusion Transformer that jointly optimizes high-resolution texture generation and part-level semantic prediction on this grid. Our approach simultaneously enhances texture seamlessness, geometric consistency, and segmentation boundary accuracy. Extensive experiments demonstrate state-of-the-art performance on both 3D texture generation and 3D semantic segmentation benchmarks.
π Abstract
Prevailing 3D texture generation methods, which often rely on multi-view fusion, are frequently hindered by inter-view inconsistencies and incomplete coverage of complex surfaces, limiting the fidelity and completeness of the generated content. To overcome these challenges, we introduce TEXTRIX, a native 3D attribute generation framework for high-fidelity texture synthesis and downstream applications such as precise 3D part segmentation. Our approach constructs a latent 3D attribute grid and leverages a Diffusion Transformer equipped with sparse attention, enabling direct coloring of 3D models in volumetric space and fundamentally avoiding the limitations of multi-view fusion. Built upon this native representation, the framework naturally extends to high-precision 3D segmentation by training the same architecture to predict semantic attributes on the grid. Extensive experiments demonstrate state-of-the-art performance on both tasks, producing seamless, high-fidelity textures and accurate 3D part segmentation with precise boundaries.