Logos as a Well-Tempered Pre-train for Sign Language Recognition

๐Ÿ“… 2025-05-15
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses two key challenges in isolated sign language recognition (ISLR): data scarcity for low-resource, minority sign languages and semantic ambiguity arising from visually similar signs. To this end, we introduce Logosโ€”the largest publicly available Russian Sign Language (RSL) dataset to dateโ€”and the first ISLR benchmark featuring explicit similarity-based sign grouping annotations to mitigate labeling ambiguity. We propose a cross-lingual general-purpose pretraining paradigm enabling few-shot transfer, and a multi-head joint training strategy to enhance generalization under low-data conditions. Our approach employs a single-stream RGB video model with optimized visual feature encoding and cross-lingual transfer mechanisms. Experiments demonstrate state-of-the-art performance on WLASL and achieve the best-reported accuracy among single-stream RGB models on AUTSL. All code, pretrained models, and the Logos dataset are fully open-sourced.

Technology Category

Application Category

๐Ÿ“ Abstract
This paper examines two aspects of the isolated sign language recognition (ISLR) task. First, despite the availability of a number of datasets, the amount of data for most individual sign languages is limited. It poses the challenge of cross-language ISLR model training, including transfer learning. Second, similar signs can have different semantic meanings. It leads to ambiguity in dataset labeling and raises the question of the best policy for annotating such signs. To address these issues, this study presents Logos, a novel Russian Sign Language (RSL) dataset, the most extensive ISLR dataset by the number of signers and one of the largest available datasets while also the largest RSL dataset in size and vocabulary. It is shown that a model, pre-trained on the Logos dataset can be used as a universal encoder for other language SLR tasks, including few-shot learning. We explore cross-language transfer learning approaches and find that joint training using multiple classification heads benefits accuracy for the target lowresource datasets the most. The key feature of the Logos dataset is explicitly annotated visually similar sign groups. We show that explicitly labeling visually similar signs improves trained model quality as a visual encoder for downstream tasks. Based on the proposed contributions, we outperform current state-of-the-art results for the WLASL dataset and get competitive results for the AUTSL dataset, with a single stream model processing solely RGB video. The source code, dataset, and pre-trained models are publicly available.
Problem

Research questions and friction points this paper is trying to address.

Limited data for individual sign languages hinders cross-language ISLR model training.
Similar signs with different meanings cause ambiguity in dataset labeling.
Need for a large, annotated dataset to improve sign language recognition accuracy.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-trained Logos dataset for cross-language ISLR
Joint training with multiple classification heads
Explicit annotation of visually similar signs
๐Ÿ”Ž Similar Papers
No similar papers found.
I
Ilya G. Ovodov
SaluteDevices
P
Petr Surovtsev
SaluteDevices
Karina Kvanchiani
Karina Kvanchiani
Tevian
computer vision
A
A. Kapitanov
SaluteDevices
Alexander Nagaev
Alexander Nagaev
SberDevices