Bootstrapping Sign Language Annotations with Sign Language Models

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses the scarcity of high-quality annotated sign language data by proposing a multimodal pseudo-labeling pipeline that, for the first time, integrates a fingerspelling recognizer, an isolated sign recognition (ISR) model, and a K-shot prompted large language model to automatically generate timestamped lexical, fingerspelled, and sequence-level pseudo-labels from partially annotated sign language videos and their corresponding English transcripts. The approach yields the first professionally curated sign language benchmark dataset with full-sequence annotations, releasing nearly 500 expert-annotated videos alongside over 300 hours of pseudo-labeled data. Experimental results demonstrate strong performance, achieving a character error rate as low as 6.7% on FSBoard and 74% top-1 accuracy on ASL Citizen.

Technology Category

Application Category

📝 Abstract

AI-driven sign language interpretation is limited by a lack of high-quality annotated data. New datasets including ASL STEM Wiki and FLEURS-ASL contain professional interpreters and 100s of hours of data but remain only partially annotated and thus underutilized, in part due to the prohibitive costs of annotating at this scale. In this work, we develop a pseudo-annotation pipeline that takes signed video and English as input and outputs a ranked set of likely annotations, including time intervals, for glosses, fingerspelled words, and sign classifiers. Our pipeline uses sparse predictions from our fingerspelling recognizer and isolated sign recognizer (ISR), along with a K-Shot LLM approach, to estimate these annotations. In service of this pipeline, we establish simple yet effective baseline fingerspelling and ISR models, achieving state-of-the-art on FSBoard (6.7% CER) and on ASL Citizen datasets (74% top-1 accuracy). To validate and provide a gold-standard benchmark, a professional interpreter annotated nearly 500 videos from ASL STEM Wiki with sequence-level gloss labels containing glosses, classifiers, and fingerspelling signs. These human annotations and over 300 hours of pseudo-annotations are being released in supplemental material.

Problem

Research questions and friction points this paper is trying to address.

sign language annotation

data scarcity

pseudo-annotation

hand-labeled data

annotation cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

pseudo-annotation

sign language recognition

fingerspelling recognizer