🤖 AI Summary
To address the challenge of object grasping for visually impaired individuals, this paper proposes the first end-to-end fully automated hand-navigation system. The method integrates YOLOv8-based object detection, ByteTrack-based multi-object tracking, and monocular depth estimation, coupled with a lightweight vibration encoding strategy and an embedded haptic feedback module—enabling robust single-object tracking, intra-class object discrimination, and dynamic obstacle-avoidance navigation in complex, unstructured environments. In real-world evaluations with blind users, the system achieves an average grasping success rate exceeding 92% across three daily tasks, with end-to-end latency under 120 ms and data transmission rate reduced to 1/20 that of conventional vision-assisted systems. The core contribution is the introduction of a closed-loop hand-navigation paradigm—the first of its kind—overcoming the limitations of static guidance and significantly enhancing user autonomy, real-time responsiveness, and practical usability.
📝 Abstract
Grasping constitutes a critical challenge for visually impaired people. To address this problem, we developed a tactile bracelet that assists in grasping by guiding the user's hand to a target object using vibration commands. Here we demonstrate the fully automated system around the bracelet, which can confidently detect and track target and distractor objects and reliably guide the user's hand. We validate our approach in three tasks that resemble complex, everyday use cases. In a grasping task, the participants grasp varying target objects on a table, guided via the automated hand navigation system. In the multiple objects task, participants grasp objects from the same class, demonstrating our system's ability to track one specific object without targeting surrounding distractor objects. Finally, the participants grasp one specific target object by avoiding an obstacle along the way in the depth navigation task, showcasing the potential to utilize our system's depth estimations to navigate even complex scenarios. Additionally, we demonstrate that the system can aid users in the real world by testing it in a less structured environment with a blind participant. Overall, our results demonstrate that the system, by translating the AI-processed visual inputs into a reduced data rate of actionable signals, enables autonomous behavior in everyday environments, thus potentially increasing the quality of life of visually impaired people.