Uni-Sign: Toward Unified Sign Language Understanding at Scale

📅 2025-01-25
📈 Citations: 0
Influential: 0
📄 PDF

career value

168K/year
🤖 AI Summary
Existing sign language understanding (SLU) methods suffer from a significant gap between pretraining and downstream tasks, resulting in limited generalization and robustness. To address this, we propose a unified SLU framework: (1) We reformulate all SLU tasks—including translation, recognition, and comprehension—as a generative sign language translation task, establishing a unified task paradigm; (2) We introduce a Prior-Guided Fusion (PGF) module and a score-aware dynamic sampling strategy to robustly integrate pose and RGB multimodal features; (3) We construct CSL-News, a large-scale Chinese sign language video–text dataset comprising 1,985 hours of annotated data. Our framework achieves state-of-the-art performance across multiple benchmarks, with substantial improvements in translation accuracy, sign recognition, and semantic understanding. Both the source code and the CSL-News dataset are publicly released.

Technology Category

Application Category

📝 Abstract
Sign language pre-training has gained increasing attention for its ability to enhance performance across various sign language understanding (SLU) tasks. However, existing methods often suffer from a gap between pre-training and fine-tuning, leading to suboptimal results. To address this, we propose modelname, a unified pre-training framework that eliminates the gap between pre-training and downstream SLU tasks through a large-scale generative pre-training strategy and a novel fine-tuning paradigm. First, we introduce CSL-News, a large-scale Chinese Sign Language (CSL) dataset containing 1,985 hours of video paired with textual annotations, which enables effective large-scale pre-training. Second, modelname unifies SLU tasks by treating downstream tasks as a single sign language translation (SLT) task during fine-tuning, ensuring seamless knowledge transfer between pre-training and fine-tuning. Furthermore, we incorporate a prior-guided fusion (PGF) module and a score-aware sampling strategy to efficiently fuse pose and RGB information, addressing keypoint inaccuracies and improving computational efficiency. Extensive experiments across multiple SLU benchmarks demonstrate that modelname achieves state-of-the-art performance across multiple downstream SLU tasks. Dataset and code are available at url{github.com/ZechengLi19/Uni-Sign}.
Problem

Research questions and friction points this paper is trying to address.

Sign Language Understanding
Effectiveness Improvement
Machine Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

ModelName
Pre-training and Fine-tuning Strategy
PGF Module and Sampling