🤖 AI Summary
This study addresses the challenge autonomous vehicles face in accurately interpreting pedestrians’ nonverbal traffic intentions conveyed through hand gestures, which critically affects human–vehicle interaction safety. To this end, the authors propose a gesture classification framework based on 2D human pose estimation, extracting a 76-dimensional set of handcrafted static and dynamic features from normalized keypoints in real-world traffic videos. The method targets four canonical gesture categories: stop, go, thanks/greeting, and no gesture. The analysis reveals that hand position and motion velocity play pivotal roles in gesture discrimination. Evaluated on the WIVW dataset, the approach achieves an 87% classification accuracy, significantly enhancing the capability of autonomous driving systems to understand and infer intentions from unstructured pedestrian behaviors.
📝 Abstract
Gestures are a key component of non-verbal communication in traffic, often helping pedestrian-to-driver interactions when formal traffic rules may be insufficient. This problem becomes more apparent when autonomous vehicles (AVs) struggle to interpret such gestures. In this study, we present a gesture classification framework using 2D pose estimation applied to real-world video sequences from the WIVW dataset. We categorise gestures into four primary classes (Stop, Go, Thank & Greet, and No Gesture) and extract 76 static and dynamic features from normalised keypoints. Our analysis demonstrates that hand position and movement velocity are especially discriminative in distinguishing between gesture classes, achieving a classification accuracy score of 87%. These findings not only improve the perceptual capabilities of AV systems but also contribute to the broader understanding of pedestrian behaviour in traffic contexts.