LA-Sign: Looped Transformers with Geometry-aware Alignment for Skeleton-based Sign Language Recognition

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the challenge of effectively modeling the multi-scale fine-grained motions in sign language, ranging from subtle finger movements to full-body dynamics, which existing methods struggle to capture. To this end, the authors propose a recurrent Transformer architecture that replaces conventional stacked layers with a parameter-shared recursive mechanism, iteratively refining latent representations to enhance expressive power. A novel geometric-aware contrastive learning strategy is introduced, enabling adaptive Poincaré alignment between skeletal and textual features in hyperbolic space to better preserve semantic structure. The proposed method achieves state-of-the-art performance on the WLASL and MSASL benchmarks while employing fewer network layers, thereby validating the efficacy of recursive refinement and geometry-aware representation learning for sign language understanding.

Technology Category

Application Category

📝 Abstract

Skeleton-based isolated sign language recognition (ISLR) demands fine-grained understanding of articulated motion across multiple spatial scales, from subtle finger movements to global body dynamics. Existing approaches typically rely on deep feed-forward architectures, which increase model capacity but lack mechanisms for recurrent refinement and structured representation. We propose LA-Sign, a looped transformer framework with geometry-aware alignment for ISLR. Instead of stacking deeper layers, LA-Sign derives its depth from recurrence, repeatedly revisiting latent representations to progressively refine motion understanding under shared parameters. To further regularise this refinement process, we present a geometry-aware contrastive objective that projects skeletal and textual features into an adaptive hyperbolic space, encouraging multi-scale semantic organisation. We study three looping designs and multiple geometric manifolds, demonstrating that encoder-decoder looping combined with adaptive Poincare alignment yields the strongest performance. Extensive experiments on WLASL and MSASL benchmarks show that LA-Sign achieves state-of-the-art results while using fewer unique layers, highlighting the effectiveness of recurrent latent refinement and geometry-aware representation learning for sign language recognition.

Problem

Research questions and friction points this paper is trying to address.

isolated sign language recognition

skeleton-based

fine-grained motion understanding

articulated motion

spatial scales

Innovation

Methods, ideas, or system contributions that make the work stand out.

Looped Transformer

Geometry-aware Alignment

Hyperbolic Representation