π€ AI Summary
This work addresses communication barriers faced by the Deaf and hard-of-hearing community in Greece by introducing the first end-to-end bidirectional Transformer framework for Greek Sign Language Production (SLP), enabling mutual translation between Greek spoken-language text and sign pose sequences. Methodologically: (1) we propose a data-driven morphemic sign representation, augmented with extended skeletal motion encoding to enhance pose modeling; (2) we design a videoβtext joint pretraining strategy coupled with a hybrid decoding schedule integrating teacher-forcing and autoregressive inference. Evaluated on the Elementary23 dataset, our approach achieves state-of-the-art (SOTA) performance in both motion naturalness and lexical accuracy of generated sign videos. Ablation studies confirm the efficacy of each component. This work establishes the first end-to-end benchmark for Greek Sign Language generation and provides a reusable technical paradigm for low-resource sign language production.
π Abstract
Sign Languages are the primary form of communication for Deaf communities across the world. To break the communication barriers between the Deaf and Hard-of-Hearing and the hearing communities, it is imperative to build systems capable of translating the spoken language into sign language and vice versa. Building on insights from previous research, we propose a deep learning model for Sign Language Production (SLP), which to our knowledge is the first attempt on Greek SLP. We tackle this task by utilizing a transformer-based architecture that enables the translation from text input to human pose keypoints, and the opposite. We evaluate the effectiveness of the proposed pipeline on the Greek SL dataset Elementary23, through a series of comparative analyses and ablation studies. Our pipeline's components, which include data-driven gloss generation, training through video to text translation and a scheduling algorithm for teacher forcing - auto-regressive decoding seem to actively enhance the quality of produced SL videos.