🤖 AI Summary
To address the challenge of dexterous grasping for individuals with hand impairments in complex, unstructured environments, this paper proposes a point-cloud-based visual predictive control framework. The framework integrates depth sensing with 3D geometric modeling to enable environmental context understanding, real-time target grasp prediction, and closed-loop control of a soft robotic hand exoskeleton. Its core innovation lies in a geometry-modeling-driven grasping paradigm that eliminates reliance on large-scale annotated datasets, thereby significantly improving generalization across diverse objects and scenes as well as robustness to environmental variations. We introduce the Grasping Ability Score (GAS) as a quantitative evaluation metric; the method achieves a GAS of 91% across 15 object categories and maintains high reconstruction success rates on previously unseen objects—setting a new state-of-the-art performance.
📝 Abstract
Grasping is a fundamental skill for interacting with and manipulating objects in the environment. However, this ability can be challenging for individuals with hand impairments. Soft hand exoskeletons designed to assist grasping can enhance or restore essential hand functions, yet controlling these soft exoskeletons to support users effectively remains difficult due to the complexity of understanding the environment. This study presents a vision-based predictive control framework that leverages contextual awareness from depth perception to predict the grasping target and determine the next control state for activation. Unlike data-driven approaches that require extensive labelled datasets and struggle with generalizability, our method is grounded in geometric modelling, enabling robust adaptation across diverse grasping scenarios. The Grasping Ability Score (GAS) was used to evaluate performance, with our system achieving a state-of-the-art GAS of 91% across 15 objects and healthy participants, demonstrating its effectiveness across different object types. The proposed approach maintained reconstruction success for unseen objects, underscoring its enhanced generalizability compared to learning-based models.