🤖 AI Summary
Existing sign language models predominantly rely on isolated vocabulary or translation-based data, neglecting pragmatically driven dynamic variation in natural dialogue—particularly gesture temporal-spatial compression and articulatory adaptation arising from teacher-student interaction in STEM education. Method: We introduce the first ASL STEM dialogue motion-capture dataset, comparatively analyzing three contexts: dyadic interaction, monologic instruction, and translation texts. We quantify dialogue-specific coordination effects (non-additive effort reduction) and their impact on gesture duration. Contribution/Results: Dialogue yields significant gesture compression (24.6%–44.6% shorter than isolated signing; nonsignificant in monologues). Integrating kinematic feature extraction, computational modeling, and linguistic analysis, we demonstrate fundamental limitations of current embedding models in capturing coordinated co-speech expression. This work bridges a critical gap in natural interactive sign language modeling and provides theoretical foundations and empirical evidence for adaptive, education-oriented sign language technologies.
📝 Abstract
Most state-of-the-art sign language models are trained on interpreter or isolated vocabulary data, which overlooks the variability that characterizes natural dialogue. However, human communication dynamically adapts to contexts and interlocutors through spatiotemporal changes and articulation style. This specifically manifests itself in educational settings, where novel vocabularies are used by teachers, and students. To address this gap, we collect a motion capture dataset of American Sign Language (ASL) STEM (Science, Technology, Engineering, and Mathematics) dialogue that enables quantitative comparison between dyadic interactive signing, solo signed lecture, and interpreted articles. Using continuous kinematic features, we disentangle dialogue-specific entrainment from individual effort reduction and show spatiotemporal changes across repeated mentions of STEM terms. On average, dialogue signs are 24.6%-44.6% shorter in duration than the isolated signs, and show significant reductions absent in monologue contexts. Finally, we evaluate sign embedding models on their ability to recognize STEM signs and approximate how entrained the participants become over time. Our study bridges linguistic analysis and computational modeling to understand how pragmatics shape sign articulation and its representation in sign language technologies.