Self-Attention Based Multi-Scale Graph Auto-Encoder Network of 3D Meshes

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for 3D mesh data often rely on intermediate representations—such as voxel grids or point clouds—to handle their non-Euclidean nature, inevitably introducing geometric distortion and information loss. To address this, we propose a spatial-domain multi-scale graph autoencoder that operates directly on raw triangle meshes. Our architecture integrates anisotropic graph convolution with self-attention mechanisms within a dual-path design, enabling concurrent modeling of fine-grained local geometry and global topological structure. Crucially, the model avoids spectral decomposition and explicit representation conversion, supporting end-to-end mesh reconstruction. Evaluated on the COMA facial dataset, our method achieves state-of-the-art performance in terms of reconstruction accuracy and robustness to complex deformations, demonstrating superior fidelity in capturing intricate shape variations without geometric degradation.

Technology Category

Application Category

📝 Abstract
3D meshes are fundamental data representations for capturing complex geometric shapes in computer vision and graphics applications. While Convolutional Neural Networks (CNNs) have excelled in structured data like images, extending them to irregular 3D meshes is challenging due to the non-Euclidean nature of the data. Graph Convolutional Networks (GCNs) offer a solution by applying convolutions to graph-structured data, but many existing methods rely on isotropic filters or spectral decomposition, limiting their ability to capture both local and global mesh features. In this paper, we introduce 3D Geometric Mesh Network (3DGeoMeshNet), a novel GCN-based framework that uses anisotropic convolution layers to effectively learn both global and local features directly in the spatial domain. Unlike previous approaches that convert meshes into intermediate representations like voxel grids or point clouds, our method preserves the original polygonal mesh format throughout the reconstruction process, enabling more accurate shape reconstruction. Our architecture features a multi-scale encoder-decoder structure, where separate global and local pathways capture both large-scale geometric structures and fine-grained local details. Extensive experiments on the COMA dataset containing human faces demonstrate the efficiency of 3DGeoMeshNet in terms of reconstruction accuracy.
Problem

Research questions and friction points this paper is trying to address.

Extending CNNs to irregular 3D meshes is challenging due to non-Euclidean data.
Existing GCNs struggle to capture both local and global mesh features effectively.
Preserving original mesh format improves accuracy in 3D shape reconstruction.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Anisotropic convolution layers for spatial learning
Preserves original polygonal mesh format
Multi-scale encoder-decoder with separate pathways
🔎 Similar Papers
No similar papers found.