EHWGesture -- A dataset for multimodal understanding of clinical gestures

📅 2025-09-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Clinical gesture understanding faces challenges including complex spatiotemporal dynamics, scarcity of multimodal and multi-view data, insufficient hand tracking accuracy, and absence of action quality assessment. To address these, this work introduces the first multimodal synchronized video dataset specifically designed for clinical assessment, comprising five clinically relevant gestures and over 1,100 sequences of RGB-D, event-camera, and optical motion-capture data. It is the first to formalize action quality assessment as a core task in clinical gesture understanding, providing high-precision ground-truth hand keypoint annotations alongside rigorous cross-device spatiotemporal alignment and calibration. Baseline experiments demonstrate strong discriminative performance across three tasks—gesture classification, trigger detection, and action quality assessment—establishing the dataset as the first reproducible, multimodal benchmark with quality-aware annotations for clinical hand function evaluation.

Technology Category

Application Category

📝 Abstract
Hand gesture understanding is essential for several applications in human-computer interaction, including automatic clinical assessment of hand dexterity. While deep learning has advanced static gesture recognition, dynamic gesture understanding remains challenging due to complex spatiotemporal variations. Moreover, existing datasets often lack multimodal and multi-view diversity, precise ground-truth tracking, and an action quality component embedded within gestures. This paper introduces EHWGesture, a multimodal video dataset for gesture understanding featuring five clinically relevant gestures. It includes over 1,100 recordings (6 hours), captured from 25 healthy subjects using two high-resolution RGB-Depth cameras and an event camera. A motion capture system provides precise ground-truth hand landmark tracking, and all devices are spatially calibrated and synchronized to ensure cross-modal alignment. Moreover, to embed an action quality task within gesture understanding, collected recordings are organized in classes of execution speed that mirror clinical evaluations of hand dexterity. Baseline experiments highlight the dataset's potential for gesture classification, gesture trigger detection, and action quality assessment. Thus, EHWGesture can serve as a comprehensive benchmark for advancing multimodal clinical gesture understanding.
Problem

Research questions and friction points this paper is trying to address.

Dynamic gesture understanding with complex spatiotemporal variations
Lack multimodal datasets with precise hand tracking
Need action quality assessment embedded in gestures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal RGB-Depth and event camera capture
Motion capture system for precise hand landmark tracking
Synchronized cross-modal alignment for clinical gesture analysis
🔎 Similar Papers
No similar papers found.
G
Gianluca Amprimo
Department of Control and Computer Engineering, Politecnico di Torino, Torino, Italy
A
Alberto Ancilotto
Fondazione Bruno Kessler, Trento, Italy
Alessandro Savino
Alessandro Savino
Associate Professor - Politecnico di Torino, DAUIN
DependabilityEdge ComputingApproximate ComputingComputing ArchitecturesBioinformatics
F
Fabio Quazzolo
Department of Control and Computer Engineering, Politecnico di Torino, Torino, Italy
Claudia Ferraris
Claudia Ferraris
CNR-IEIIT, Torino, Italy
G
Gabriella Olmo
Department of Control and Computer Engineering, Politecnico di Torino, Torino, Italy
Elisabetta Farella
Elisabetta Farella
ICT Center - FBK
Wireless Sensor NetworksEmbedded SystemBody Area NetworksHuman Computer InteractionInternet of Things
Stefano Di Carlo
Stefano Di Carlo
Full Professor, Politecnico di Torino
testreliabilitybioinformaticscybersecurityneuromorphic computing