The Importance of Facial Features in Vision-based Sign Language Recognition: Eyes, Mouth or Full Face?

📅 2025-07-28

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This study investigates the role of non-manual facial features—specifically eyes, mouth, and full face—in vision-based isolated-word automatic sign language recognition (ASLR). We quantitatively evaluate the contribution of each facial region to recognition performance using both CNN and Transformer architectures on standard benchmarks, complemented by qualitative analysis via saliency maps. Results demonstrate that incorporating facial features significantly improves accuracy, with the mouth region yielding the largest gain—substantially outperforming both eyes and full-face inputs. This work advances beyond prior coarse-grained comparisons (e.g., “hand-only” vs. “hand+full-face”) by introducing the first fine-grained attribution analysis of facial subregions in ASLR. It establishes that precise modeling of mouth dynamics is essential for robust ASLR performance and provides novel empirical grounding for designing multimodal representations tailored to sign language understanding.

Technology Category

Application Category

📝 Abstract

Non-manual facial features play a crucial role in sign language communication, yet their importance in automatic sign language recognition (ASLR) remains underexplored. While prior studies have shown that incorporating facial features can improve recognition, related work often relies on hand-crafted feature extraction and fails to go beyond the comparison of manual features versus the combination of manual and facial features. In this work, we systematically investigate the contribution of distinct facial regionseyes, mouth, and full faceusing two different deep learning models (a CNN-based model and a transformer-based model) trained on an SLR dataset of isolated signs with randomly selected classes. Through quantitative performance and qualitative saliency map evaluation, we reveal that the mouth is the most important non-manual facial feature, significantly improving accuracy. Our findings highlight the necessity of incorporating facial features in ASLR.

Problem

Research questions and friction points this paper is trying to address.

Investigates importance of facial features in sign language recognition

Compares contributions of eyes, mouth, and full face in ASLR

Identifies mouth as most impactful non-manual feature for accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning models analyze facial regions

Mouth is key non-manual feature for accuracy

Saliency maps evaluate facial feature importance

🔎 Similar Papers

SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale