SeeThruFinger: See and Grasp Anything with a Multi-Modal Soft Touch

📅 2023-12-15
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing robotic transparent grasping relies heavily on external cameras and specialized force/torque sensors, limiting portability, scalability, and integration. Method: This paper introduces SeeThruFinger—a soft tactile finger that achieves marker-free, multimodal perception using only an intra-fingertip monocular camera. We propose the See-Thru-Network, which jointly integrates visual inpainting, real-time semantic segmentation of large soft-body deformations, and multimodal feature disentanglement with joint decoding. Contribution/Results: To our knowledge, this is the first framework enabling unified scene reconstruction, object detection, depth estimation, instance segmentation, 6D force/torque estimation, and contact event detection from a single visual input modality. Experiments demonstrate <8.2% error in 6D wrench estimation and 0.94 F1-score for contact detection—without external cameras or fingertip force sensors. Inference latency is sub-millisecond, enabling real-time, concurrent multitask execution. This work establishes a novel paradigm for fully vision-driven, omnidirectional, adaptive tactile perception.
📝 Abstract
We present SeeThruFinger, a Vision-Based Tactile Sensing (VBTS) architecture using a markerless See-Thru-Network. It achieves simultaneous visual perception and tactile sensing while providing omni-directional, adaptive grasping for manipulation. Multi-modal perception of intrinsic and extrinsic interactions is critical in building intelligent robots that learn. Instead of adding various sensors for different modalities, a preferred solution is to integrate them into one elegant and coherent design, which is a challenging task. This study leverages the in-finger vision to inpaint occluded regions of the external environment, achieving coherent scene reconstruction for visual perception. By tracking real-time segmentation of the Soft Polyhedral Network's large-scale deformation, we achieved real-time markerless tactile sensing of 6D forces and torques. We demonstrate the capable performances of the SeeThruFinger for reactive grasping without using external cameras or dedicated force and torque sensors on the fingertips. Using the inpainted scene and the deformation mask, we further demonstrate the multi-modal performance of the SeeThruFinger architecture to simultaneously achieve various capabilities, including but not limited to scene inpainting, object detection, depth sensing, scene segmentation, masked deformation tracking, 6D force-and-torque sensing, and contact event detection, all within a single input from the in-finger vision of the See-Thru-Network in a markerless way. All codes are available at https://github.com/ancorasir/SeeThruFinger.
Problem

Research questions and friction points this paper is trying to address.

Develops vision-based perception for soft robots using single visual input
Enables reactive grasping without external cameras or force sensors
Achieves multi-modal perception including force sensing and object detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-based see-through perception via single visual input
Markerless multi-modal sensing using soft robotic finger
Learning reactive grasping without external sensors
🔎 Similar Papers
No similar papers found.
F
Fang Wan
School of Design, Southern University of Science and Technology, Shenzhen, China 518055
Z
Zheng Wang
Department of Mechanical and Energy Engineering, Southern University of Science and Technology, Shenzhen, China 518055
W
Wei Zhang
School of System Design and Intelligent Manufacturing, Southern University of Science and Technology, Shenzhen, China 518055
C
Chaoyang Song
Design and Learning Research Group, Southern University of Science and Technology, Shenzhen, China 518055