š¤ AI Summary
The gesture recognition community lacks a systematic survey addressing both gesture classification and 3D hand pose estimation from multimodal visual inputs (e.g., RGB, depth, single/multi-view videos), particularly regarding key challengesārobustness in real-world scenarios, occlusion handling, cross-user generalization, and real-time inference.
Method: This paper presents the first comprehensive, structured review of gesture and 3D hand pose recognition across multimodal inputs, categorizing methodologiesāincluding classical machine learning, CNNs, RNNs, Transformers, graph convolutional networks, and multi-view geometric modelingāby input modality and task. It uniformly evaluates major benchmark datasets and application contexts, and introduces a cross-modal comparative framework.
Contribution: We distill four critical open challengesārobustness, occlusion robustness, cross-user generalization, and real-time performanceāand provide a clear, forward-looking research roadmap to guide future advances in multimodal hand understanding.
š Abstract
Hand gesture recognition has become an important research area, driven by the growing demand for human-computer interaction in fields such as sign language recognition, virtual and augmented reality, and robotics. Despite the rapid growth of the field, there are few surveys that comprehensively cover recent research developments, available solutions, and benchmark datasets. This survey addresses this gap by examining the latest advancements in hand gesture and 3D hand pose recognition from various types of camera input data including RGB images, depth images, and videos from monocular or multiview cameras, examining the differing methodological requirements of each approach. Furthermore, an overview of widely used datasets is provided, detailing their main characteristics and application domains. Finally, open challenges such as achieving robust recognition in real-world environments, handling occlusions, ensuring generalization across diverse users, and addressing computational efficiency for real-time applications are highlighted to guide future research directions. By synthesizing the objectives, methodologies, and applications of recent studies, this survey offers valuable insights into current trends, challenges, and opportunities for future research in human hand gesture recognition.