Text-Driven 3D Hand Motion Generation from Sign Language Data

📅 2025-08-21

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the cross-modal generation problem of synthesizing 3D hand motions from natural language descriptions—encompassing hand shapes, spatial positions, and finger/arm dynamics. To overcome the scarcity of annotated text-motion pairs, we propose HandMDM, the first text-to-3D-hand-motion diffusion model. Our method leverages large-scale sign language videos and large language models, augmented with a sign-language attribute lexicon and motion script cues, to automatically generate high-quality pseudo-labeled text-motion data. We then train a text-conditioned diffusion model on this data. HandMDM achieves strong cross-domain generalization—uniquely supporting unseen sign classes, heterogeneous sign language systems, and non-sign gestures—while producing high-fidelity, temporally coherent 3D hand motions across diverse scenarios. To foster research in embodied interaction and sign language technology, we will publicly release the code, model, and dataset.

Technology Category

Application Category

📝 Abstract

Our goal is to train a generative model of 3D hand motions, conditioned on natural language descriptions specifying motion characteristics such as handshapes, locations, finger/hand/arm movements. To this end, we automatically build pairs of 3D hand motions and their associated textual labels with unprecedented scale. Specifically, we leverage a large-scale sign language video dataset, along with noisy pseudo-annotated sign categories, which we translate into hand motion descriptions via an LLM that utilizes a dictionary of sign attributes, as well as our complementary motion-script cues. This data enables training a text-conditioned hand motion diffusion model HandMDM, that is robust across domains such as unseen sign categories from the same sign language, but also signs from another sign language and non-sign hand movements. We contribute extensive experimental investigation of these scenarios and will make our trained models and data publicly available to support future research in this relatively new field.

Problem

Research questions and friction points this paper is trying to address.

Generate 3D hand motions from text descriptions

Build large-scale text-motion pairs from sign language

Create cross-domain robust hand motion generation model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging large-scale sign language videos with pseudo-annotations

Translating sign categories to text via LLM with motion cues

Training text-conditioned diffusion model for cross-domain hand motions

🔎 Similar Papers

No similar papers found.