SLRTP2025 Sign Language Production Challenge: Methodology, Results, and Future Work

πŸ“… 2025-08-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Sign Language Production (SLP) has long suffered from the absence of standardized evaluation protocols, hindering model comparability and technical progress. To address this, we propose the first comprehensive, standardized benchmark framework for SLP, comprising a unified evaluation protocol, a high-quality sign language skeletal dataset (built upon RWTH-PHOENIX-Weather-2014T and an in-house hidden test set), and an open challenge competition mechanism. Methodologically, we adopt a text-to-pose (T2P) paradigm, integrating retrieval-augmented generation with pretrained language models. The inaugural challenge attracted 33 teams submitting 231 solutions; top-performing models achieved BLEU-1 of 31.40 and DTW-MJE of 0.0574β€”marking substantial improvements in motion naturalness and semantic fidelity. This work establishes the first reproducible, extensible evaluation benchmark for SLP, enabling rigorous cross-model comparison and fostering standardized development and collaborative advancement across the research community.

Technology Category

Application Category

πŸ“ Abstract
Sign Language Production (SLP) is the task of generating sign language video from spoken language inputs. The field has seen a range of innovations over the last few years, with the introduction of deep learning-based approaches providing significant improvements in the realism and naturalness of generated outputs. However, the lack of standardized evaluation metrics for SLP approaches hampers meaningful comparisons across different systems. To address this, we introduce the first Sign Language Production Challenge, held as part of the third SLRTP Workshop at CVPR 2025. The competition's aims are to evaluate architectures that translate from spoken language sentences to a sequence of skeleton poses, known as Text-to-Pose (T2P) translation, over a range of metrics. For our evaluation data, we use the RWTH-PHOENIX-Weather-2014T dataset, a German Sign Language - Deutsche Gebardensprache (DGS) weather broadcast dataset. In addition, we curate a custom hidden test set from a similar domain of discourse. This paper presents the challenge design and the winning methodologies. The challenge attracted 33 participants who submitted 231 solutions, with the top-performing team achieving BLEU-1 scores of 31.40 and DTW-MJE of 0.0574. The winning approach utilized a retrieval-based framework and a pre-trained language model. As part of the workshop, we release a standardized evaluation network, including high-quality skeleton extraction-based keypoints establishing a consistent baseline for the SLP field, which will enable future researchers to compare their work against a broader range of methods.
Problem

Research questions and friction points this paper is trying to address.

Standardizing evaluation metrics for Sign Language Production (SLP) systems.
Improving realism and naturalness of sign language video generation.
Establishing a baseline for Text-to-Pose (T2P) translation in SLP.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-based framework for sign language
Pre-trained language model integration
Standardized skeleton extraction keypoints
πŸ”Ž Similar Papers
No similar papers found.
Harry Walsh
Harry Walsh
University of Surrey
Sign Language ProductionComputer VisionNatural Language Processing
E
Ed Fish
University of Surrey
Ozge Mercanoglu Sincan
Ozge Mercanoglu Sincan
Center for Vision, Speech and Signal Processing, University of Surrey
Computer VisionDeep Learning
Mohamed Ilyes Lakhal
Mohamed Ilyes Lakhal
CVSSP, University of Surrey
computer visiondeep learning
Richard Bowden
Richard Bowden
Professor of Computer Vision and Machine Learning, CVSSP, University of Surrey
Computer VisionMachine learningArtificial Intelligence
N
Neil Fox
University of Birmingham
Bencie Woll
Bencie Woll
University College London
K
Kepeng Wu
University of Science and Technology of China
Z
Zecheng Li
University of Science and Technology of China
W
Weichao Zhao
University of Science and Technology of China
H
Haodong Wang
University of Science and Technology of China
Wengang Zhou
Wengang Zhou
Professor, EEIS Department, University of Science and Technology of China
Multimedia RetrievalComputer VisionComputer Game
Houqiang Li
Houqiang Li
Professor, Department of Electric Engineering and Information Science, University of Science and
Multimedia SearchImage/Video AnalysisImage/Video Coding
S
Shengeng Tang
Hefei University of Technology
J
Jiayi He
Hefei University of Technology
X
Xu Wang
Hefei University of Technology
R
Ruobei Zhang
Hefei University of Technology
Yaxiong Wang
Yaxiong Wang
Hefei University of Technology (HFUT)
DeepFake DetectionVersion-languageImage Generation/SegmentationComputer Vision
Lechao Cheng
Lechao Cheng
Associate Professor, Hefei University of Technology
Imbalanced LearningDistillationNoisy Label LearningWeakly Supervised LearningVisual Tuning
M
Meryem Tasyurek
Hacettepe University
T
Tugce Kiziltepe
Hacettepe University
Hacer Yalim Keles
Hacer Yalim Keles
Hacettepe University, Computer Engineering Department
computer visionmachine learninggenerative adversarial networks