Multi-View Hierarchical Graph Neural Network for Sketch-Based 3D Shape Retrieval

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
This work addresses the limitations in sketch-based 3D shape retrieval, where existing multi-view feature aggregation methods often neglect geometric relationships and multi-level details, and exhibit insufficient zero-shot generalization. To overcome these challenges, the authors propose a hierarchical multi-view graph neural network that constructs a view-level graph structure to model inter-view geometric dependencies through local graph convolutions and global attention mechanisms. A novel view selector is introduced to enable hierarchical graph coarsening, progressively expanding the receptive field while suppressing redundant information. Furthermore, for the first time, CLIP text embeddings are leveraged as semantic prototypes to align sketch and 3D features into a shared semantic space, facilitating category-agnostic matching and zero-shot generalization. Experiments demonstrate that the proposed method significantly outperforms state-of-the-art approaches on two public benchmarks under both category-level and zero-shot evaluation settings.

Technology Category

Application Category

📝 Abstract
Sketch-based 3D shape retrieval (SBSR) aims to retrieve 3D shapes that are consistent with the category of the input hand-drawn sketch. The core challenge of this task lies in two aspects: existing methods typically employ simplified aggregation strategies for independently encoded 3D multi-view features, which ignore the geometric relationships between views and multi-level details, resulting in weak 3D representation. Simultaneously, traditional SBSR methods are constrained by visible category limitations, leading to poor performance in zero-shot scenarios. To address these challenges, we propose Multi-View Hierarchical Graph Neural Network (MV-HGNN), a novel framework for SBSR. Specifically, we construct a view-level graph and capture adjacent geometric dependencies and cross-view message passing via local graph convolution and global attention. A view selector is further introduced to perform hierarchical graph coarsening, enabling a progressively larger receptive field for graph convolution and mitigating the interference of redundant views, which leads to more discriminate discriminative hierarchical 3D representation. To enable category agnostic alignment and mitigate overfitting to seen classes, we leverage CLIP text embeddings as semantic prototypes and project both sketch and 3D features into a shared semantic space. We use a two-stage training strategy for category-level retrieval and a one-stage strategy for zero-shot retrieval under the same model architecture. Under both category-level and zero-shot settings, extensive experiments on two public benchmarks demonstrate that MV-HGNN outperforms state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Sketch-Based 3D Shape Retrieval
Multi-View Representation
Zero-Shot Retrieval
Geometric Relationships
Category Agnostic Alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-View Graph Neural Network
Hierarchical Graph Coarsening
Sketch-Based 3D Retrieval
Zero-Shot Learning
CLIP Semantic Embedding
🔎 Similar Papers
No similar papers found.
H
Hang Cheng
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, China
M
Muyan He
Control and Simulation Center, Harbin Institute of Technology, Harbin, Heilongjiang
M
Mingyu Fan
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, China
C
Chengfeng Xie
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, China
X
Xi Cheng
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, China
Long Zeng
Long Zeng
Tsinghua University
Intelligent ManufacturingEmbodied AI RoboticsSketch Modeling