Large Sign Language Models: Toward 3D American Sign Language Translation

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing sign language recognition methods rely on 2D video inputs and struggle to effectively model spatial configurations, pose dynamics, and depth information. Method: This paper proposes the first end-to-end large language model (LLM)-based framework for 3D American Sign Language (ASL) translation to text. It integrates 3D pose estimation features with a multimodal encoder and introduces an instruction-guided LLM architecture that supports external prompt conditioning for translation control; instruction tuning enables semantic understanding of 3D gesture sequences and fluent natural language generation. Contribution/Results: Experiments demonstrate substantial improvements over state-of-the-art 2D baselines under complex real-world conditions, achieving higher robustness and accuracy. This work pioneers the direct application of LLMs to 3D sign language translation, establishing a novel paradigm for virtual communication for deaf and hard-of-hearing individuals, while advancing the convergence of multimodal language understanding and embodied intelligence systems.

Technology Category

Application Category

📝 Abstract

We present Large Sign Language Models (LSLM), a novel framework for translating 3D American Sign Language (ASL) by leveraging Large Language Models (LLMs) as the backbone, which can benefit hearing-impaired individuals'virtual communication. Unlike existing sign language recognition methods that rely on 2D video, our approach directly utilizes 3D sign language data to capture rich spatial, gestural, and depth information in 3D scenes. This enables more accurate and resilient translation, enhancing digital communication accessibility for the hearing-impaired community. Beyond the task of ASL translation, our work explores the integration of complex, embodied multimodal languages into the processing capabilities of LLMs, moving beyond purely text-based inputs to broaden their understanding of human communication. We investigate both direct translation from 3D gesture features to text and an instruction-guided setting where translations can be modulated by external prompts, offering greater flexibility. This work provides a foundational step toward inclusive, multimodal intelligent systems capable of understanding diverse forms of language.

Problem

Research questions and friction points this paper is trying to address.

Translating 3D American Sign Language using LLMs

Capturing spatial and gestural information from 3D data

Enhancing communication accessibility for hearing-impaired individuals

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LLMs for 3D American Sign Language translation

Directly using 3D data to capture spatial and depth information

Exploring gesture-to-text translation with instruction-guided modulation

🔎 Similar Papers

SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale