Transfer Learning from Visual Speech Recognition to Mouthing Recognition in German Sign Language

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the underexplored challenge of mouthing modeling in German Sign Language Recognition (SLR). It is the first to investigate knowledge transfer from Visual Speech Recognition (VSR) to sign language mouthing recognition. We formulate mouthing recognition as a task semantically distinct yet structurally related to VSR, and propose a cross-task, multi-source joint training framework integrating multi-task learning, cross-vocabulary transfer, and temporal video modeling (via 3D CNNs or Transformers). A key contribution is the empirical validation that task similarity critically determines transfer efficacy, enabling mutually beneficial bidirectional performance gains between VSR and mouthing recognition. Experiments demonstrate substantial improvements in mouthing recognition accuracy, concurrent gains in VSR performance, and enhanced model robustness. The approach establishes a generalizable knowledge transfer paradigm for mouthing recognition—a domain severely constrained by scarce annotated data.

Technology Category

Application Category

📝 Abstract

Sign Language Recognition (SLR) systems primarily focus on manual gestures, but non-manual features such as mouth movements, specifically mouthing, provide valuable linguistic information. This work directly classifies mouthing instances to their corresponding words in the spoken language while exploring the potential of transfer learning from Visual Speech Recognition (VSR) to mouthing recognition in German Sign Language. We leverage three VSR datasets: one in English, one in German with unrelated words and one in German containing the same target words as the mouthing dataset, to investigate the impact of task similarity in this setting. Our results demonstrate that multi-task learning improves both mouthing recognition and VSR accuracy as well as model robustness, suggesting that mouthing recognition should be treated as a distinct but related task to VSR. This research contributes to the field of SLR by proposing knowledge transfer from VSR to SLR datasets with limited mouthing annotations.

Problem

Research questions and friction points this paper is trying to address.

Classify mouthing in German Sign Language to spoken words

Explore transfer learning from Visual Speech Recognition

Improve recognition using multi-task learning and VSR datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transfer learning from VSR to mouthing recognition

Multi-task learning improves recognition accuracy

Leveraging multiple VSR datasets for task similarity

🔎 Similar Papers

SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale