🤖 AI Summary
This work addresses the challenge of semantic segmentation on textured non-manifold 3D meshes, which exhibit irregular structures and are often processed by existing methods that neglect texture information. To overcome this limitation, the authors propose a texture-aware Transformer that, for the first time, encodes patch-level texture pixels into learnable tokens and fuses them with geometric features. The method introduces a two-stage Transformer block (TSTB) to enable effective local–global feature interaction and employs a hierarchical multi-scale feature aggregation strategy. Additionally, the study presents the first dataset specifically annotated for damage assessment of roof tiles in cultural heritage sites. The proposed approach achieves state-of-the-art performance, reporting 81.9% mean F1-score and 94.3% overall accuracy on the SUM benchmark, and 49.7% mean F1-score and 72.8% overall accuracy on the newly introduced dataset.
📝 Abstract
Textured 3D meshes jointly represent geometry, topology, and appearance, yet their irregular structure poses significant challenges for deep-learning-based semantic segmentation. While a few recent methods operate directly on meshes without imposing geometric constraints, they typically overlook the rich textural information also provided by such meshes. We introduce a texture-aware transformer that learns directly from raw pixels associated with each mesh face, coupled with a new hierarchical learning scheme for multi-scale feature aggregation. A texture branch summarizes all face-level pixels into a learnable token, which is fused with geometrical descriptors and processed by a stack of Two-Stage Transformer Blocks (TSTB), which allow for both a local and a global information flow. We evaluate our model on the Semantic Urban Meshes (SUM) benchmark and a newly curated cultural-heritage dataset comprising textured roof tiles with triangle-level annotations for damage types. Our method achieves 81.9\% mF1 and 94.3\% OA on SUM and 49.7\% mF1 and 72.8\% OA on the new dataset, substantially outperforming existing approaches.