SeeThruFinger: See and Grasp Anything with a Multi-Modal Soft Touch

📅 2023-12-15

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

Existing robotic transparent grasping relies heavily on external cameras and specialized force/torque sensors, limiting portability, scalability, and integration. Method: This paper introduces SeeThruFinger—a soft tactile finger that achieves marker-free, multimodal perception using only an intra-fingertip monocular camera. We propose the See-Thru-Network, which jointly integrates visual inpainting, real-time semantic segmentation of large soft-body deformations, and multimodal feature disentanglement with joint decoding. Contribution/Results: To our knowledge, this is the first framework enabling unified scene reconstruction, object detection, depth estimation, instance segmentation, 6D force/torque estimation, and contact event detection from a single visual input modality. Experiments demonstrate <8.2% error in 6D wrench estimation and 0.94 F1-score for contact detection—without external cameras or fingertip force sensors. Inference latency is sub-millisecond, enabling real-time, concurrent multitask execution. This work establishes a novel paradigm for fully vision-driven, omnidirectional, adaptive tactile perception.

📝 Abstract

We present SeeThruFinger, a Vision-Based Tactile Sensing (VBTS) architecture using a markerless See-Thru-Network. It achieves simultaneous visual perception and tactile sensing while providing omni-directional, adaptive grasping for manipulation. Multi-modal perception of intrinsic and extrinsic interactions is critical in building intelligent robots that learn. Instead of adding various sensors for different modalities, a preferred solution is to integrate them into one elegant and coherent design, which is a challenging task. This study leverages the in-finger vision to inpaint occluded regions of the external environment, achieving coherent scene reconstruction for visual perception. By tracking real-time segmentation of the Soft Polyhedral Network's large-scale deformation, we achieved real-time markerless tactile sensing of 6D forces and torques. We demonstrate the capable performances of the SeeThruFinger for reactive grasping without using external cameras or dedicated force and torque sensors on the fingertips. Using the inpainted scene and the deformation mask, we further demonstrate the multi-modal performance of the SeeThruFinger architecture to simultaneously achieve various capabilities, including but not limited to scene inpainting, object detection, depth sensing, scene segmentation, masked deformation tracking, 6D force-and-torque sensing, and contact event detection, all within a single input from the in-finger vision of the See-Thru-Network in a markerless way. All codes are available at https://github.com/ancorasir/SeeThruFinger.

Problem

Research questions and friction points this paper is trying to address.

Develops vision-based perception for soft robots using single visual input

Enables reactive grasping without external cameras or force sensors

Achieves multi-modal perception including force sensing and object detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-based see-through perception via single visual input

Markerless multi-modal sensing using soft robotic finger

Learning reactive grasping without external sensors

🔎 Similar Papers

No similar papers found.

Authors to Follow