SIGMA-ASL: Sensor-Integrated Multimodal Dataset for Sign Language Recognition

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work addresses the limitations of existing sign language recognition approaches, which predominantly rely on visual data and are thus susceptible to challenges such as variable lighting, occlusion, and privacy concerns, while also lacking cross-modal diversity. To overcome these issues, the study presents the first large-scale multimodal sign language dataset that fuses RGB-D camera, millimeter-wave radar, and wrist-worn inertial measurement unit (IMU) sensors, enabling synchronized capture of visual, radio-frequency reflection, and kinematic signals with millisecond-level temporal alignment. The dataset includes 93,545 synchronized samples across 160 American Sign Language signs performed by 20 participants, accompanied by standardized preprocessing pipelines and both user-dependent and user-independent evaluation protocols. Experimental results demonstrate that this resource effectively facilitates the development of robust, generalizable, and privacy-preserving sign language recognition systems, significantly advancing the frontier of multimodal perception.

📝 Abstract

Automatic sign language recognition (SLR) has become a key enabler of inclusive human-computer interaction, fostering seamless communication between deaf individuals and hearing communities. Despite significant advances in multimodal learning, existing SLR research remains dominated by vision-based datasets, which are limited by sensitivity to lighting and occlusion, privacy concerns, and a lack of cross-modal diversity. To address these challenges, we introduce SIGMA-ASL, a large-scale multimodal dataset for SLR. The dataset integrates an Azure Kinect RGB-D camera, a millimeter-wave (mmWave) radar, and two wrist-worn inertial measurement units (IMUs) to capture complementary visual, radio-reflection, and kinematic information. Collected in a controlled studio environment with 20 participants performing 160 common American sign language (ASL) signs, SIGMA-ASL provides 93,545 temporally synchronized word-level multimodal clips. A unified sensing framework achieves millisecond-level alignment across modalities, enabling reliable sensor fusion and cross-modal learning. We further design standardized preprocessing pipelines and benchmarking protocols under both user-dependent and user-independent settings, offering a comprehensive foundation for evaluating single and multimodal SLR. Extensive experiments validate the dataset's quality and demonstrate its potential as a valuable resource for developing robust, privacy-preserving, and ubiquitous sign language recognition systems.

Problem

Research questions and friction points this paper is trying to address.

sign language recognition

vision-based datasets

occlusion

privacy concerns

cross-modal diversity

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal fusion

mmWave radar

inertial measurement unit

sensor synchronization

sign language recognition

🔎 Similar Papers

No similar papers found.