🤖 AI Summary
This study addresses the lack of low-cost, robust, and generalizable indirect force-sensing methods for soft grippers handling unknown objects. The authors propose a model-driven vision-based force perception system that leverages an on-wrist RGB-D camera to extract structural keypoints, integrating deep learning–enabled online 3D reconstruction, iterative contact localization, and inverse finite element analysis based on the SOFA framework to enable real-time grasp force estimation. This approach is the first to be specifically adapted to the geometric structure of modern soft grippers and maintains robustness under visual occlusion and with previously unseen objects. Experimental results demonstrate high accuracy, achieving an RMSE of 0.23 N (NRMSE 2.11%) during the loading phase and 0.48 N (NRMSE 4.34%) over the entire grasping process, confirming both precision and practical applicability.
📝 Abstract
Grasp force estimation can help prevent robots from damaging delicate objects during manipulation and improve learning-based robotic control. Integrating force sensing into deformable grippers negotiates trade-offs in cost, complexity, mechanical robustness, and performance. With the growing integration of RGB-D wrist cameras into robotic systems for control purposes, camera-based techniques are a promising solution for indirect visual force estimation. Current approaches mostly utilize end-to-end deep learning, which can be brittle when generalizing to new scenarios, while existing model-based approaches are unsuited to grasping and modern grasper geometries. To address these challenges, we developed a model-based visual force sensing approach integrating an iterative contact localization with generalization to unseen objects. The system extracts structural key points from wrist camera RGB-D images of deforming fin-ray-shaped soft grippers, and uses these key points to define parameters of an inverse finite element analysis simulation in Simulation Open Framework Architecture. The iterative contact localization sub-system utilizes a deep learning-based online 3D reconstruction and pose estimation pipeline to dynamically update contact location, and is robust to visual occlusion and unseen objects. Our system demonstrated an average root mean square error of 0.23 N and normalized root mean square deviation of 2.11% during the load phase, and 0.48 N and 4.34% over the entire grasping process when interacting with different objects under various conditions, showcasing its potential for real-time model-based indirect force sensing of soft grippers.