TSLFormer: A Lightweight Transformer Model for Turkish Sign Language Recognition Using Skeletal Landmarks

📅 2025-05-11

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Accurate and efficient word-level Turkish Sign Language (TSL) recognition remains challenging for real-time assistive communication systems. Method: This paper reformulates sign language recognition as a sequence-to-sequence translation task, using only 3D skeletal coordinates of hands and torso extracted by MediaPipe. We propose a lightweight, temporally aware Transformer architecture specifically designed for skeletal sequence modeling, integrating linguistic constraints and strict real-time requirements. Contribution/Results: Evaluated on the AUTSL dataset (36,000+ samples, 227 vocabulary items), our method achieves state-of-the-art accuracy while significantly reducing model parameters. It attains sub-30ms inference latency per frame—meeting stringent mobile deployment requirements. The approach delivers a high-accuracy, low-latency, and production-ready solution for assistive communication systems supporting the Deaf and hard-of-hearing community.

Technology Category

Application Category

📝 Abstract

This study presents TSLFormer, a light and robust word-level Turkish Sign Language (TSL) recognition model that treats sign gestures as ordered, string-like language. Instead of using raw RGB or depth videos, our method only works with 3D joint positions - articulation points - extracted using Google's Mediapipe library, which focuses on the hand and torso skeletal locations. This creates efficient input dimensionality reduction while preserving important semantic gesture information. Our approach revisits sign language recognition as sequence-to-sequence translation, inspired by the linguistic nature of sign languages and the success of transformers in natural language processing. Since TSLFormer uses the self-attention mechanism, it effectively captures temporal co-occurrence within gesture sequences and highlights meaningful motion patterns as words unfold. Evaluated on the AUTSL dataset with over 36,000 samples and 227 different words, TSLFormer achieves competitive performance with minimal computational cost. These results show that joint-based input is sufficient for enabling real-time, mobile, and assistive communication systems for hearing-impaired individuals.

Problem

Research questions and friction points this paper is trying to address.

Develop lightweight model for Turkish Sign Language recognition

Use skeletal landmarks to reduce input dimensionality

Enable real-time assistive communication for hearing-impaired

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D joint positions for input dimensionality reduction

Applies sequence-to-sequence translation with transformers

Captures temporal co-occurrence in gesture sequences

🔎 Similar Papers

No similar papers found.