Shape: A Self-Supervised 3D Geometry Foundation Model for Industrial CAD Analysis

📅 2026-04-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
This work addresses the critical need for accurate, generalizable, and interpretable 3D geometric representations in industrial CAD analysis by introducing Shape—a self-supervised foundational model for 3D geometry that achieves, for the first time, pretraining on unlabeled CAD meshes. Its core components include a Multi-scale Geometry-aware Tokenizer (MAGNO), a structured 3D latent grid, a Grouped Query Attention Transformer, and learnable reconstruction priors. The framework leverages multi-resolution contrastive consistency and normalized geometric statistical reconstruction, enhanced by dimension-wise normalization to improve stability and interpretability. Evaluated on 2,983 held-out CAD meshes, Shape attains a reconstruction R² of 0.729 and achieves 98.1% top-1 retrieval accuracy under the Wang–Isola protocol, with negligible gap between training and validation performance.

Technology Category

Application Category

📝 Abstract
Industrial CAD workflows require robust, generalizable 3D geometric representations supporting accuracy and explainability. We introduce Shape, a self-supervised foundation model converting surface meshes into dense per-token embeddings. Shape combines a structured 3D latent grid, a multi-scale geometry-aware tokenizer (MAGNO) with cross-attention, and a transformer processor using grouped-query attention and RMSNorm. A learned reconstruction prior enables per-region attribution for explainable predictions. Pretraining uses masked-token reconstruction of normalized geometry statistics and multi-resolution contrastive consistency. The 10.9M-parameter backbone is pretrained on 61,052 CAD meshes from Thingi10K, MFCAD, and Fusion360. On a held-out split of 2,983 meshes, Shape achieves reconstruction R2 = 0.729 and 98.1% top-1 retrieval under the Wang-Isola protocol, with near-zero reconstruction train/val gap (contrastive scores use a larger evaluation pool). A 2x2 ablation on loss type and target-space normalization shows per-dimension normalization is critical: without it, performance collapses (R2 < 0.14, top-1 < 88%); with it, both losses succeed (R2 > 0.70, top-1 > 96%). Smooth-L1 offers secondary stability. Code, embeddings, and an interactive demo are released at https://github.com/simd-ai/shape.
Problem

Research questions and friction points this paper is trying to address.

3D geometry
CAD analysis
self-supervised learning
geometric representation
explainability
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised learning
3D foundation model
geometry-aware tokenization
explainable AI
masked reconstruction
🔎 Similar Papers