Large Sign Language Models: Toward 3D American Sign Language Translation

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing sign language recognition methods rely on 2D video inputs and struggle to effectively model spatial configurations, pose dynamics, and depth information. Method: This paper proposes the first end-to-end large language model (LLM)-based framework for 3D American Sign Language (ASL) translation to text. It integrates 3D pose estimation features with a multimodal encoder and introduces an instruction-guided LLM architecture that supports external prompt conditioning for translation control; instruction tuning enables semantic understanding of 3D gesture sequences and fluent natural language generation. Contribution/Results: Experiments demonstrate substantial improvements over state-of-the-art 2D baselines under complex real-world conditions, achieving higher robustness and accuracy. This work pioneers the direct application of LLMs to 3D sign language translation, establishing a novel paradigm for virtual communication for deaf and hard-of-hearing individuals, while advancing the convergence of multimodal language understanding and embodied intelligence systems.

Technology Category

Application Category

📝 Abstract
We present Large Sign Language Models (LSLM), a novel framework for translating 3D American Sign Language (ASL) by leveraging Large Language Models (LLMs) as the backbone, which can benefit hearing-impaired individuals'virtual communication. Unlike existing sign language recognition methods that rely on 2D video, our approach directly utilizes 3D sign language data to capture rich spatial, gestural, and depth information in 3D scenes. This enables more accurate and resilient translation, enhancing digital communication accessibility for the hearing-impaired community. Beyond the task of ASL translation, our work explores the integration of complex, embodied multimodal languages into the processing capabilities of LLMs, moving beyond purely text-based inputs to broaden their understanding of human communication. We investigate both direct translation from 3D gesture features to text and an instruction-guided setting where translations can be modulated by external prompts, offering greater flexibility. This work provides a foundational step toward inclusive, multimodal intelligent systems capable of understanding diverse forms of language.
Problem

Research questions and friction points this paper is trying to address.

Translating 3D American Sign Language using LLMs
Capturing spatial and gestural information from 3D data
Enhancing communication accessibility for hearing-impaired individuals
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LLMs for 3D American Sign Language translation
Directly using 3D data to capture spatial and depth information
Exploring gesture-to-text translation with instruction-guided modulation
🔎 Similar Papers
No similar papers found.