🤖 AI Summary
Existing tactile sensors rely on macroscopic surface deformation, rendering them unreliable for detecting liquids, semi-fluids, and ultra-soft materials that induce negligible deformation. To address this, we propose LightTact—a novel optical tactile fingertip sensor featuring an ambient-light-shielded optical configuration that selectively captures only diffusely scattered light generated at the contact interface. This enables pixel-level contact segmentation with near-zero background noise (mean grayscale < 3), invariant to material properties, contact force, and ambient illumination. Leveraging high-contrast vision–tactile image fusion, spatially aligned multimodal representation, and direct transfer inference via vision-language models (VLMs), LightTact successfully executes delicate robotic tasks—including water spreading, facial cream pickup, and thin-film manipulation—under minimal-contact conditions. Furthermore, it supports direct VLM-based interpretation of tactile images for resistance identification and sorting. This work breaks from conventional deformation-dependent sensing paradigms, establishing a robust, generalizable, and interpretable framework for light-contact perception in soft interaction scenarios.
📝 Abstract
Contact often occurs without macroscopic surface deformation, such as during interaction with liquids, semi-liquids, or ultra-soft materials. Most existing tactile sensors rely on deformation to infer contact, making such light-contact interactions difficult to perceive robustly. To address this, we present LightTact, a visual-tactile fingertip sensor that makes contact directly visible via a deformation-independent, optics-based principle. LightTact uses an ambient-blocking optical configuration that suppresses both external light and internal illumination at non-contact regions, while transmitting only the diffuse light generated at true contacts. As a result, LightTact produces high-contrast raw images in which non-contact pixels remain near-black (mean gray value < 3) and contact pixels preserve the natural appearance of the contacting surface. Built on this, LightTact achieves accurate pixel-level contact segmentation that is robust to material properties, contact force, surface appearance, and environmental lighting. We further integrate LightTact on a robotic arm and demonstrate manipulation behaviors driven by extremely light contact, including water spreading, facial-cream dipping, and thin-film interaction. Finally, we show that LightTact's spatially aligned visual-tactile images can be directly interpreted by existing vision-language models, enabling resistor value reasoning for robotic sorting.