Recognition and Estimation of Human Finger Pointing with an RGB Camera for Robot Directive

📅 2023-07-06
🏛️ arXiv.org
📈 Citations: 6
Influential: 1
📄 PDF
🤖 AI Summary
Existing pointing gesture recognition methods rely on depth cameras, are confined to indoor environments, and support only discrete target selection. To address these limitations, this paper proposes PointingNet—the first end-to-end pointing gesture understanding framework that operates solely with a single RGB camera and generalizes across both indoor and outdoor scenes. Methodologically, we introduce arm segmentation masks to guide pointing detection and design an angular regression branch for high-precision 3D pointing direction estimation (mean angular error <2°, improving upon the state-of-the-art by 26°). Integrating geometric projection with motion planning, the framework directly outputs robot-reachable target coordinates. Evaluated on two real-world robotic platforms, PointingNet robustly interprets natural pointing gestures and enables accurate navigation. Our approach significantly enhances the practicality and generalization capability of human–robot interaction in depth-sensor-free settings.
📝 Abstract
In communication between humans, gestures are often preferred or complementary to verbal expression since the former offers better spatial referral. Finger pointing gesture conveys vital information regarding some point of interest in the environment. In human-robot interaction, a user can easily direct a robot to a target location, for example, in search and rescue or factory assistance. State-of-the-art approaches for visual pointing estimation often rely on depth cameras, are limited to indoor environments and provide discrete predictions between limited targets. In this paper, we explore the learning of models for robots to understand pointing directives in various indoor and outdoor environments solely based on a single RGB camera. A novel framework is proposed which includes a designated model termed PointingNet. PointingNet recognizes the occurrence of pointing followed by approximating the position and direction of the index finger. The model relies on a novel segmentation model for masking any lifted arm. While state-of-the-art human pose estimation models provide poor pointing angle estimation accuracy of 28deg, PointingNet exhibits mean accuracy of less than 2deg. With the pointing information, the target is computed followed by planning and motion of the robot. The framework is evaluated on two robotic systems yielding accurate target reaching.
Problem

Research questions and friction points this paper is trying to address.

Recognizing human finger pointing gestures using single web-camera
Estimating precise pointing direction for robot interaction tasks
Enabling robots to interpret pointing commands in diverse environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses single web-camera for pointing recognition in varied environments
Implements PointingNet with classification and regression for finger estimation
Introduces novel arm segmentation model to improve pointing accuracy
E
Eran Bamani
School of Mechanical Engineering, Tel-Aviv University, Israel
E
Eden Nissinman
School of Mechanical Engineering, Tel-Aviv University, Israel
L
Lisa Koenigsberg
School of Mechanical Engineering, Tel-Aviv University, Israel
I
Inbar Meir
School of Mechanical Engineering, Tel-Aviv University, Israel
Y
Yoav Matalon
School of Mechanical Engineering, Tel-Aviv University, Israel
Avishai Sintov
Avishai Sintov
Tel-Aviv University
Robotics