SONAR-SLT: Multilingual Sign Language Translation via Language-Agnostic Sentence Embedding Supervision

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Traditional sign language translation (SLT) relies on monolingual text supervision, resulting in poor generalization and limited cross-lingual scalability. To address these limitations, this work proposes a multilingual SLT framework. First, it introduces language-agnostic multimodal sentence embeddings—jointly encoding multilingual text and speech—as a unified supervisory signal, enabling direct cross-lingual translation without intermediate pivoting. Second, it designs a coupling enhancement strategy that jointly applies multilingual back-translation and video-level perturbations to mitigate data scarcity in low-resource settings. Implemented within an end-to-end neural machine translation architecture, the method achieves significant improvements over monolingual embedding baselines on the BLEURT metric—particularly for low-resource language pairs—demonstrating superior generalization capability, semantic robustness, and cross-lingual extensibility.

Technology Category

Application Category

📝 Abstract

Sign language translation (SLT) is typically trained with text in a single spoken language, which limits scalability and cross-language generalization. Earlier approaches have replaced gloss supervision with text-based sentence embeddings, but up to now, these remain tied to a specific language and modality. In contrast, here we employ language-agnostic, multimodal embeddings trained on text and speech from multiple languages to supervise SLT, enabling direct multilingual translation. To address data scarcity, we propose a coupled augmentation method that combines multilingual target augmentations (i.e. translations into many languages) with video-level perturbations, improving model robustness. Experiments show consistent BLEURT gains over text-only sentence embedding supervision, with larger improvements in low-resource settings. Our results demonstrate that language-agnostic embedding supervision, combined with coupled augmentation, provides a scalable and semantically robust alternative to traditional SLT training.

Problem

Research questions and friction points this paper is trying to address.

Overcoming language-specific limitations in sign language translation systems

Addressing data scarcity through multilingual and multimodal augmentation

Enabling scalable cross-language generalization for sign language translation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-agnostic multimodal embeddings supervise translation

Coupled augmentation combines multilingual targets with video perturbations

Method enables scalable multilingual sign language translation

🔎 Similar Papers

SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale