Targeted Linguistic Analysis of Sign Language Models with Minimal Translation Pairs

📅 2026-04-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

180K/year
🤖 AI Summary
It remains unclear whether current sign language translation models genuinely comprehend the multimodal linguistic phenomena inherent in sign languages, such as manual gestures, facial expressions, and body posture. This work introduces the ASL-MTP benchmark, which systematically constructs minimal translation pairs categorized by specific linguistic phenomena for the first time. By integrating multimodal input ablation experiments, the study evaluates models’ reliance on manual versus non-manual cues. Results demonstrate that, although state-of-the-art models outperform random baselines across most phenomena, they remain overly dependent on hand-based information and frequently overlook critical non-manual linguistic signals. These findings reveal significant limitations in existing models’ capacity for true multimodal linguistic understanding in sign language translation.
📝 Abstract
Models of sign language have historically lagged behind those for spoken language (text and speech). Recent work has greatly improved their performance on tasks like sign language translation and isolated sign recognition. However, it remains unclear to what extent existing models capture various linguistic phenomena of sign language, and how well they use cues from the multiple articulators used in sign language (hands, upper body, face). We introduce a new benchmark dataset for American Sign Language, ASL Minimal Translation Pairs (ASL-MTP), divided into multiple types of sign language phenomena and corresponding minimal pairs of translations, for performing such linguistic analyses. As a case study, we use ASL-MTP to analyze a state-of-the-art ASL-to-English translation model. We conduct a targeted analysis of the model by ablating various input cues during training and inference and evaluating on the phenomena in ASL-MTP. Our results show that, while the model performs above chance level on most of the phenomena, it relies strongly on manual cues while often missing crucial non-manual cues.
Problem

Research questions and friction points this paper is trying to address.

sign language
linguistic phenomena
non-manual cues
minimal pairs
multimodal articulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

sign language modeling
minimal translation pairs
linguistic analysis
non-manual cues
benchmark dataset