Towards Skeletal and Signer Noise Reduction in Sign Language Production via Quaternion-Based Pose Encoding and Contrastive Learning

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Neural sign language production (SLP) suffers from high intra-class variability in sign movements and is susceptible to noise arising from signer morphology and stylistic idiosyncrasies. To address these challenges, we propose a robust SLP framework: first, we represent skeletal rotations using quaternions and enforce manifold-aware modeling of pose dynamics via a geodesic loss; second, we introduce a semantics-guided contrastive learning mechanism that explicitly disentangles semantic motion representations from signer-specific nuisance factors in the latent space; third, both components are integrated into a Progressive Transformers architecture. Evaluated on Phoenix14T, our method achieves a 16% improvement in keypoint accuracy and a 6% reduction in mean bone angle error, demonstrating substantial gains in cross-signer generalization and semantic fidelity of generated signs.

Technology Category

Application Category

📝 Abstract

One of the main challenges in neural sign language production (SLP) lies in the high intra-class variability of signs, arising from signer morphology and stylistic variety in the training data. To improve robustness to such variations, we propose two enhancements to the standard Progressive Transformers (PT) architecture (Saunders et al., 2020). First, we encode poses using bone rotations in quaternion space and train with a geodesic loss to improve the accuracy and clarity of angular joint movements. Second, we introduce a contrastive loss to structure decoder embeddings by semantic similarity, using either gloss overlap or SBERT-based sentence similarity, aiming to filter out anatomical and stylistic features that do not convey relevant semantic information. On the Phoenix14T dataset, the contrastive loss alone yields a 16% improvement in Probability of Correct Keypoint over the PT baseline. When combined with quaternion-based pose encoding, the model achieves a 6% reduction in Mean Bone Angle Error. These results point to the benefit of incorporating skeletal structure modeling and semantically guided contrastive objectives on sign pose representations into the training of Transformer-based SLP models.

Problem

Research questions and friction points this paper is trying to address.

Reducing signer noise and skeletal variability in sign language production

Improving robustness to anatomical and stylistic variations in signs

Enhancing accuracy of joint movements and semantic clarity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quaternion-based pose encoding for angular accuracy

Contrastive learning for semantic similarity structuring

Combined geodesic and contrastive losses enhancement

🔎 Similar Papers

SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale