🤖 AI Summary
This work addresses the challenge of enabling robots to perceive physical properties—particularly mechanical compliance—of soft objects through tactile sensing, thereby supporting human-like dexterous manipulation.
Method: We propose a “latent filter” model: an unsupervised, action-conditional deep state-space model that maps multimodal tactile signals (normal force, shear force, vibration) and controllable interaction primitives (pressing, rotating, sliding) onto an interpretable latent space, explicitly capturing the causal influence of embodied interaction strategies on perception. A modular, mechanically tunable electronic skin platform is employed for systematic evaluation.
Contribution/Results: Our approach demonstrates that multimodal tactile sensing significantly outperforms unimodal alternatives; moreover, it successfully disentangles and infers soft-object mechanical properties within a structured latent space. This establishes a new paradigm for tactile-driven autonomous manipulation grounded in interpretable, dynamics-aware representation learning.
📝 Abstract
To enable robots to develop human-like fine manipulation, it is essential to understand how mechanical compliance, multi-modal sensing, and purposeful interaction jointly shape tactile perception. In this study, we use a dedicated modular e-Skin with tunable mechanical compliance and multi-modal sensing (normal, shear forces and vibrations) to systematically investigate how sensing embodiment and interaction strategies influence robotic perception of objects. Leveraging a curated set of soft wave objects with controlled viscoelastic and surface properties, we explore a rich set of palpation primitives-pressing, precession, sliding that vary indentation depth, frequency, and directionality. In addition, we propose the latent filter, an unsupervised, action-conditioned deep state-space model of the sophisticated interaction dynamics and infer causal mechanical properties into a structured latent space. This provides generalizable and in-depth interpretable representation of how embodiment and interaction determine and influence perception. Our investigation demonstrates that multi-modal sensing outperforms uni-modal sensing. It highlights a nuanced interaction between the environment and mechanical properties of e-Skin, which should be examined alongside the interaction by incorporating temporal dynamics.