๐ค AI Summary
Existing sign language processing models lack explicit, structured representations of handshape, limiting both recognition accuracy and linguistic analysis capabilities. Method: We introduce the first structured handshape recognition benchmark for sign language sequences and propose a decoupled spatiotemporal modeling framework: an anatomy-guided graph neural network encodes static handshape topology, while contrastive learning captures dynamic handshape evolution over time. This design explicitly disentangles intrinsic handshape morphology from temporal motion patterns, enhancing representation discriminability. Contribution/Results: Our method achieves 46% accuracy on a 37-class handshape recognition taskโsurpassing the strongest baseline by 21 percentage points (+21%). These results empirically validate that structured, decoupled modeling is essential for advancing sign language understanding, particularly in bridging low-level visual perception with higher-level linguistic interpretation.
๐ Abstract
Handshapes serve a fundamental phonological role in signed languages, with American Sign Language employing approximately 50 distinct shapes. However,computational approaches rarely model handshapes explicitly, limiting both recognition accuracy and linguistic analysis.We introduce a novel graph neural network that separates temporal dynamics from static handshape configurations. Our approach combines anatomically-informed graph structures with contrastive learning to address key challenges in handshape recognition, including subtle interclass distinctions and temporal variations. We establish the first benchmark for structured handshape recognition in signing sequences, achieving 46% accuracy across 37 handshape classes (with baseline methods achieving 25%).