Transfer Learning from Visual Speech Recognition to Mouthing Recognition in German Sign Language

๐Ÿ“… 2025-05-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the underexplored challenge of mouthing modeling in German Sign Language Recognition (SLR). It is the first to investigate knowledge transfer from Visual Speech Recognition (VSR) to sign language mouthing recognition. We formulate mouthing recognition as a task semantically distinct yet structurally related to VSR, and propose a cross-task, multi-source joint training framework integrating multi-task learning, cross-vocabulary transfer, and temporal video modeling (via 3D CNNs or Transformers). A key contribution is the empirical validation that task similarity critically determines transfer efficacy, enabling mutually beneficial bidirectional performance gains between VSR and mouthing recognition. Experiments demonstrate substantial improvements in mouthing recognition accuracy, concurrent gains in VSR performance, and enhanced model robustness. The approach establishes a generalizable knowledge transfer paradigm for mouthing recognitionโ€”a domain severely constrained by scarce annotated data.

Technology Category

Application Category

๐Ÿ“ Abstract
Sign Language Recognition (SLR) systems primarily focus on manual gestures, but non-manual features such as mouth movements, specifically mouthing, provide valuable linguistic information. This work directly classifies mouthing instances to their corresponding words in the spoken language while exploring the potential of transfer learning from Visual Speech Recognition (VSR) to mouthing recognition in German Sign Language. We leverage three VSR datasets: one in English, one in German with unrelated words and one in German containing the same target words as the mouthing dataset, to investigate the impact of task similarity in this setting. Our results demonstrate that multi-task learning improves both mouthing recognition and VSR accuracy as well as model robustness, suggesting that mouthing recognition should be treated as a distinct but related task to VSR. This research contributes to the field of SLR by proposing knowledge transfer from VSR to SLR datasets with limited mouthing annotations.
Problem

Research questions and friction points this paper is trying to address.

Classify mouthing in German Sign Language to spoken words
Explore transfer learning from Visual Speech Recognition
Improve recognition using multi-task learning and VSR datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transfer learning from VSR to mouthing recognition
Multi-task learning improves recognition accuracy
Leveraging multiple VSR datasets for task similarity
๐Ÿ”Ž Similar Papers
No similar papers found.