🤖 AI Summary
Existing isolated sign language recognition (ISLR) methods are constrained by language-specific annotations and fixed vocabularies, limiting cross-lingual generalization and dynamic vocabulary expansion. This paper introduces the first pretraining paradigm for sign gesture embeddings grounded in semantic intrinsic features, integrating self-supervised representation learning, dense vector retrieval, and one-shot meta-learning to enable zero-shot, cross-lingual transfer and novel sign recognition without fine-tuning. The method is co-designed with Deaf and hearing communities to ensure technical inclusivity and practical applicability. Evaluated on a large-scale, cross-lingual dictionary comprising 10,235 signs from diverse sign languages, our approach achieves a state-of-the-art one-shot mean reciprocal rank (MRR) of 50.8%, significantly improving generalization capability and deployment efficiency over prior work.
📝 Abstract
Isolated Sign Language Recognition (ISLR) is crucial for scalable sign language technology, yet language-specific approaches limit current models. To address this, we propose a one-shot learning approach that generalises across languages and evolving vocabularies. Our method involves pretraining a model to embed signs based on essential features and using a dense vector search for rapid, accurate recognition of unseen signs. We achieve state-of-the-art results, including 50.8% one-shot MRR on a large dictionary containing 10,235 unique signs from a different language than the training set. Our approach is robust across languages and support sets, offering a scalable, adaptable solution for ISLR. Co-created with the Deaf and Hard of Hearing (DHH) community, this method aligns with real-world needs, and advances scalable sign language recognition.