Geometry-Guided Modeling of Foundation Features Enables Generalizable Object Shape Deformation Learning

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Monocular 3D shape reconstruction struggles to generalize across arbitrary viewpoints and unseen object categories. This work proposes a category-level template matching approach based on explicit deformation, which effectively bridges the geometric and representational gaps between a fixed template and target observations through geometry-guided feature modeling and view-adaptive feature aggregation. By fusing multi-view template features, the method enhances reconstruction consistency and significantly outperforms existing approaches in scenarios involving large deformations and multiple viewpoints. It demonstrates strong cross-category generalization capabilities and has been successfully deployed in real-world dexterous robotic manipulation tasks.
📝 Abstract
Monocular 3D shape recovery is fundamental to geometric understanding, yet achieving robust generalization across arbitrary viewpoints and unseen object categories remains a significant challenge. In this paper, we present a generalizable deformation learning framework that reconstructs 3D objects by explicitly deforming a category-level shape template to match the target observation. To address complex shape variations between the template and the target, we introduce a geometry-guided feature modeling mechanism. This process first enriches foundation features with template topology to yield a geometry-aware representation, which is then explicitly correlated with the target observation to guide precise deformation. Furthermore, to bridge the disparity between the fixed template and arbitrary target views, we propose a view-adaptive feature aggregation module. This module leverages multi-view template features and their corresponding camera poses to enrich the canonical template representation, ensuring robust feature alignment regardless of the target's perspective. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods in handling large shape variations and diverse viewpoints, exhibiting strong generalization to novel categories and effectively supporting downstream real-world dexterous robotic manipulation tasks. Project homepage: https://GODeform.github.io/
Problem

Research questions and friction points this paper is trying to address.

monocular 3D shape recovery
generalization
unseen object categories
arbitrary viewpoints
shape deformation
Innovation

Methods, ideas, or system contributions that make the work stand out.

geometry-guided feature modeling
view-adaptive feature aggregation
generalizable shape deformation
monocular 3D reconstruction
template-based deformation
Y
Yiyao Ma
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
K
Kai Chen
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
Z
Zhongxiang Zhou
Zhejiang Innovation Center for Humanoid Robotics, Ningbo, China
Z
Zhuheng Song
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
D
Dongsheng Xie
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
Z
Zelong Tan
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
Rong Xiong
Rong Xiong
Zhejiang University
Robotics
Q
Qi Dou
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China