🤖 AI Summary
This work addresses the challenge of precise target selection via pointing gestures in planar workspace during human-robot collaboration. We propose a method integrating human pose estimation with a shoulder-wrist geometric extension model, leveraging RGB-D data for robust object localization. A systematic pointing gesture evaluation framework is established, quantifying accuracy, robustness, and interaction naturalness. Furthermore, we design and implement a multimodal collaborative robot prototype supporting gesture recognition, speech transcription, and synthesis. Experiments demonstrate that our approach achieves high localization accuracy (mean error < 3.2 cm) and maintains stability under challenging conditions—including occlusion and multi-person interference. All source code is publicly released, providing a reproducible benchmark and technical foundation for research on pointing-based interaction.
📝 Abstract
Pointing gestures are a common interaction method used in Human-Robot Collaboration for various tasks, ranging from selecting targets to guiding industrial processes. This study introduces a method for localizing pointed targets within a planar workspace. The approach employs pose estimation, and a simple geometric model based on shoulder-wrist extension to extract gesturing data from an RGB-D stream. The study proposes a rigorous methodology and comprehensive analysis for evaluating pointing gestures and target selection in typical robotic tasks. In addition to evaluating tool accuracy, the tool is integrated into a proof-of-concept robotic system, which includes object detection, speech transcription, and speech synthesis to demonstrate the integration of multiple modalities in a collaborative application. Finally, a discussion over tool limitations and performance is provided to understand its role in multimodal robotic systems. All developments are available at: https://github.com/NMKsas/gesture_pointer.git.