GLASS: Graph and Vision-Language Assisted Semantic Shape Correspondence

📅 2026-03-08

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the challenging problem of establishing dense semantic correspondences across 3D shapes under unsupervised settings, particularly when severe non-isometric deformations or cross-category variations are present—scenarios where traditional methods relying on isometric assumptions fail. The authors propose a novel framework that integrates geometric spectral analysis with vision-language foundation models. By leveraging multi-view consistent feature extraction and injecting zero-shot 3D segmentation with language embeddings to enhance semantic awareness, the method further incorporates a graph-structured contrastive learning loss that exploits both geodesic and topological relationships to improve global consistency. Evaluated on the SNIS, SMAL, and TOPKIDS benchmarks, the approach achieves state-of-the-art performance, reducing the average geodesic error to 0.21, 4.5, and 5.6, respectively—improving upon the URSSM baseline by 57%, 25%, and 37%.

Technology Category

Application Category

📝 Abstract

Establishing dense correspondence across 3D shapes is crucial for fundamental downstream tasks, including texture transfer, shape interpolation, and robotic manipulation. However, learning these mappings without manual supervision remains a formidable challenge, particularly under severe non-isometric deformations and in inter-class settings where geometric cues are ambiguous. Conventional functional map methods, while elegant, typically struggle in these regimes due to their reliance on isometry. To address this, we present GLASS, a framework that bridges the gap by integrating geometric spectral analysis with rich semantic priors from vision-language foundation models. GLASS introduces three key innovations: (i) a view-consistent strategy that enables robust multi-view visual feature extraction from powerful vision foundation models; (ii) the injection of language embeddings into vertex descriptors via zero-shot 3D segmentation, capturing high-level part semantics; and (iii) a graph-assisted contrastive loss that enforces structural consistency between regions (e.g., source's head''$\leftrightarrow$ target's head'') by leveraging geodesic and topological relationships between regions. This design allows GLASS to learn globally coherent and semantically consistent maps without ground-truth supervision. Extensive experiments demonstrate that GLASS achieves state-of-the-art performance across all regimes, maintaining high accuracy on standard near-isometric tasks while significantly advancing performance in challenging settings. Specifically, it achieves average geodesic errors of 0.21, 4.5, and 5.6 on the inter-class benchmark SNIS and non-isometric benchmarks SMAL and TOPKIDS, reducing errors from URSSM baselines of 0.49, 6.0, and 8.9 by 57%, 25%, and 37%, respectively.

Problem

Research questions and friction points this paper is trying to address.

3D shape correspondence

non-isometric deformation

inter-class correspondence

semantic mapping

unsupervised learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic shape correspondence

vision-language foundation models

zero-shot 3D segmentation