Rethinking Transparent Object Grasping: Depth Completion with Monocular Depth Estimation and Instance Mask

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Transparent objects cause depth sensor data incompleteness or distortion due to their optical properties, severely degrading robotic grasping accuracy and robustness. Existing RGB-D end-to-end depth completion methods rely on implicit modeling of depth reliability, resulting in poor generalization in real-world scenarios. This paper proposes an instance-mask-guided depth completion framework: it first explicitly localizes transparent regions via instance segmentation, then integrates geometric priors from monocular depth estimation to construct structured contextual constraints—thereby significantly reducing dependence on implicit reasoning. Evaluated on standard benchmarks (e.g., Trans10K) and real robotic grasping setups, our method achieves state-of-the-art performance, reducing depth error by 23.6% and improving grasp success rate by 18.4%. These results validate the effectiveness of explicit transparent-region modeling and multi-source depth fusion.

Technology Category

Application Category

📝 Abstract
Due to the optical properties, transparent objects often lead depth cameras to generate incomplete or invalid depth data, which in turn reduces the accuracy and reliability of robotic grasping. Existing approaches typically input the RGB-D image directly into the network to output the complete depth, expecting the model to implicitly infer the reliability of depth values. However, while effective in training datasets, such methods often fail to generalize to real-world scenarios, where complex light interactions lead to highly variable distributions of valid and invalid depth data. To address this, we propose ReMake, a novel depth completion framework guided by an instance mask and monocular depth estimation. By explicitly distinguishing transparent regions from non-transparent ones, the mask enables the model to concentrate on learning accurate depth estimation in these areas from RGB-D input during training. This targeted supervision reduces reliance on implicit reasoning and improves generalization to real-world scenarios. Additionally, monocular depth estimation provides depth context between the transparent object and its surroundings, enhancing depth prediction accuracy. Extensive experiments show that our method outperforms existing approaches on both benchmark datasets and real-world scenarios, demonstrating superior accuracy and generalization capability. Code and videos are available at https://chengyaofeng.github.io/ReMake.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Incomplete depth data from transparent objects reduces robotic grasping accuracy
Existing methods fail to generalize due to complex light interactions
Proposing ReMake for accurate depth completion using instance masks and monocular depth
Innovation

Methods, ideas, or system contributions that make the work stand out.

Instance mask guides transparent object depth completion
Monocular depth estimation enhances depth context
Targeted supervision improves real-world generalization
🔎 Similar Papers
No similar papers found.
Yaofeng Cheng
Yaofeng Cheng
Harbin Institute of Technology, University of Liverpool, WuHan University of Technology
GraspingNetwork pruning
X
Xinkai Gao
State Key Laboratory of Robotics and System at Harbin Institute of Technology, Harbin 150001, China
S
Sen Zhang
State Key Laboratory of Robotics and System at Harbin Institute of Technology, Harbin 150001, China
F
Fusheng Zha
State Key Laboratory of Robotics and System at Harbin Institute of Technology, Harbin 150001, China; Lanzhou University of Technology
C
Chao Zeng
Department of Computer Science, University of Liverpool, L69 3BX Liverpool, U.K.
L
Lining Sun
State Key Laboratory of Robotics and System at Harbin Institute of Technology, Harbin 150001, China
Chenguang Yang
Chenguang Yang
Chair Professor in Robotics, Fellow of IEEE, IET, IMechE, AIAA, BCS
Robotics