Universal 3D Shape Matching via Coarse-to-Fine Language Guidance

📅 2026-02-22

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Establishing semantic dense correspondences across categories and under strong non-isometric deformations in 3D shapes remains highly challenging. This work proposes UniMatch, a novel framework that, for the first time, integrates multimodal large language models (MLLMs) with vision-language models (VLMs) to achieve universal 3D shape matching without relying on predefined parts or category-specific priors. UniMatch employs a coarse-to-fine language-guided strategy, combining category-agnostic segmentation, MLLM-driven part naming, VLM-based text embeddings, and rank-based contrastive learning to significantly enhance matching robustness under non-isometric deformations. Extensive experiments demonstrate that UniMatch consistently outperforms existing methods across diverse cross-category, non-isometric 3D matching tasks, validating its generality and effectiveness.

Technology Category

Application Category

📝 Abstract

Establishing dense correspondences between shapes is a crucial task in computer vision and graphics, while prior approaches depend on near-isometric assumptions and homogeneous subject types (i.e., only operate for human shapes). However, building semantic correspondences for cross-category objects remains challenging and has received relatively little attention. To achieve this, we propose UniMatch, a semantic-aware, coarse-to-fine framework for constructing dense semantic correspondences between strongly non-isometric shapes without restricting object categories. The key insight is to lift "coarse" semantic cues into "fine" correspondence, which is achieved through two stages. In the "coarse" stage, we perform class-agnostic 3D segmentation to obtain non-overlapping semantic parts and prompt multimodal large language models (MLLMs) to identify part names. Then, we employ pretrained vision language models (VLMs) to extract text embeddings, enabling the construction of matched semantic parts. In the "fine" stage, we leverage these coarse correspondences to guide the learning of dense correspondences through a dedicated rank-based contrastive scheme. Thanks to class-agnostic segmentation, language guiding, and rank-based contrastive learning, our method is versatile for universal object categories and requires no predefined part proposals, enabling universal matching for inter-class and non-isometric shapes. Extensive experiments demonstrate UniMatch consistently outperforms competing methods in various challenging scenarios.

Problem

Research questions and friction points this paper is trying to address.

3D shape matching

dense correspondence

cross-category

non-isometric

semantic correspondence

Innovation

Methods, ideas, or system contributions that make the work stand out.

universal 3D shape matching

coarse-to-fine guidance

language-guided correspondence