In-Hand Object Pose Estimation via Visual-Tactile Fusion

📅 2025-06-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address degraded 6D pose estimation accuracy caused by visual occlusion during in-hand robotic manipulation, this paper proposes a wrist-mounted RGB-D vision and fingertip vision-tactile sensor fusion approach. We introduce a novel weighted heterogeneous sensor fusion module that dynamically modulates the contributions of visual and tactile modalities. Furthermore, we propose an enhanced Iterative Closest Point (ICP) algorithm tailored for weighted point clouds, enabling the first modality-adaptive fusion framework specifically designed for in-hand manipulation scenarios. Experimental evaluation on a physical robotic platform demonstrates average translational and rotational errors of 7.5 mm and 16.7°, respectively—representing a 20% improvement over a pure-vision baseline. Insertion task experiments further validate the method’s high robustness and operational practicality under occlusion-prone conditions.

Technology Category

Application Category

📝 Abstract
Accurate in-hand pose estimation is crucial for robotic object manipulation, but visual occlusion remains a major challenge for vision-based approaches. This paper presents an approach to robotic in-hand object pose estimation, combining visual and tactile information to accurately determine the position and orientation of objects grasped by a robotic hand. We address the challenge of visual occlusion by fusing visual information from a wrist-mounted RGB-D camera with tactile information from vision-based tactile sensors mounted on the fingertips of a robotic gripper. Our approach employs a weighting and sensor fusion module to combine point clouds from heterogeneous sensor types and control each modality's contribution to the pose estimation process. We use an augmented Iterative Closest Point (ICP) algorithm adapted for weighted point clouds to estimate the 6D object pose. Our experiments show that incorporating tactile information significantly improves pose estimation accuracy, particularly when occlusion is high. Our method achieves an average pose estimation error of 7.5 mm and 16.7 degrees, outperforming vision-only baselines by up to 20%. We also demonstrate the ability of our method to perform precise object manipulation in a real-world insertion task.
Problem

Research questions and friction points this paper is trying to address.

Estimating in-hand object pose under visual occlusion
Fusing visual and tactile data for accurate pose estimation
Improving robotic manipulation via multi-sensor fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses visual and tactile sensor data
Uses weighted point cloud fusion module
Employs augmented ICP for pose estimation
🔎 Similar Papers
No similar papers found.
F
Felix Nonnengiesser
Department of Computer Science, Goethe Universitaet Frankfurt, Germany
Alap Kshirsagar
Alap Kshirsagar
Postdoctoral Researcher at TU Darmstadt
Human-Robot InteractionRobotics
Boris Belousov
Boris Belousov
Senior Researcher at German Research Centre for Artificial Intelligence (DFKI GmbH)
Robot LearningReinforcement LearningMachine LearningRobotics
J
Jan Peters
Intelligent Autonomous Systems Lab, Department of Computer Science, TU Darmstadt, Germany; German Research Center for AI (DFKI); Centre for Cognitive Science, TU Darmstadt; Hessian Center for Artificial Intelligence (Hessian.AI), Darmstadt