🤖 AI Summary
This work addresses the limitations of existing eye-tracking–driven assistive robotic systems, which struggle with insufficient 3D gaze estimation accuracy and inadequate multi-task intent recognition, thereby hindering severely motor-impaired users from independently performing daily tasks. To overcome these challenges, the authors propose a novel shared-control framework that leverages task diagrams as visual fiducial markers. By integrating feature matching with state-of-the-art object detection models under an eye-in-hand configuration, the system achieves high-precision object and task recognition without requiring knowledge of the user’s spatial position relative to target objects. The approach attains 97.9% accuracy in object and task selection, supports seamless extension to new tasks and objects, and is released as open-source software. The paper also distills key practical insights gained during real-world deployment.
📝 Abstract
Shared control improves Human-Robot Interaction by reducing the user's workload and increasing the robot's autonomy. It allows robots to perform tasks under the user's supervision. Current eye-tracking-driven approaches face several challenges. These include accuracy issues in 3D gaze estimation and difficulty interpreting gaze when differentiating between multiple tasks. We present an eye-tracking-driven control framework, aimed at enabling individuals with severe physical disabilities to perform daily tasks independently. Our system uses task pictograms as fiducial markers combined with a feature matching approach that transmits data of the selected object to accomplish necessary task related measurements with an eye-in-hand configuration. This eye-tracking control does not require knowledge of the user's position in relation to the object. The framework correctly interpreted object and task selection in up to 97.9% of measurements. Issues were found in the evaluation, that were improved and shared as lessons learned. The open-source framework can be adapted to new tasks and objects due to the integration of state-of-the-art object detection models.