🤖 AI Summary
Existing sign language understanding (SLU) methods suffer from a significant gap between pretraining and downstream tasks, resulting in limited generalization and robustness. To address this, we propose a unified SLU framework: (1) We reformulate all SLU tasks—including translation, recognition, and comprehension—as a generative sign language translation task, establishing a unified task paradigm; (2) We introduce a Prior-Guided Fusion (PGF) module and a score-aware dynamic sampling strategy to robustly integrate pose and RGB multimodal features; (3) We construct CSL-News, a large-scale Chinese sign language video–text dataset comprising 1,985 hours of annotated data. Our framework achieves state-of-the-art performance across multiple benchmarks, with substantial improvements in translation accuracy, sign recognition, and semantic understanding. Both the source code and the CSL-News dataset are publicly released.
📝 Abstract
Sign language pre-training has gained increasing attention for its ability to enhance performance across various sign language understanding (SLU) tasks. However, existing methods often suffer from a gap between pre-training and fine-tuning, leading to suboptimal results. To address this, we propose modelname, a unified pre-training framework that eliminates the gap between pre-training and downstream SLU tasks through a large-scale generative pre-training strategy and a novel fine-tuning paradigm. First, we introduce CSL-News, a large-scale Chinese Sign Language (CSL) dataset containing 1,985 hours of video paired with textual annotations, which enables effective large-scale pre-training. Second, modelname unifies SLU tasks by treating downstream tasks as a single sign language translation (SLT) task during fine-tuning, ensuring seamless knowledge transfer between pre-training and fine-tuning. Furthermore, we incorporate a prior-guided fusion (PGF) module and a score-aware sampling strategy to efficiently fuse pose and RGB information, addressing keypoint inaccuracies and improving computational efficiency. Extensive experiments across multiple SLU benchmarks demonstrate that modelname achieves state-of-the-art performance across multiple downstream SLU tasks. Dataset and code are available at url{github.com/ZechengLi19/Uni-Sign}.