HandX: Scaling Bimanual Motion and Interaction Generation

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods struggle to generate realistic bimanual interaction motions with fine-grained finger movements, temporally coherent contacts, and natural coordination. This work presents the first large-scale, high-quality motion dataset specifically focused on detailed bimanual interactions, paired with a decoupled semantic annotation strategy powered by large language models. Building upon this foundation, we introduce a unified framework for bimanual motion generation that supports both conditional diffusion and autoregressive modeling. The study further proposes novel evaluation metrics tailored to hand interaction, demonstrating high-fidelity, semantically consistent generation of dexterous bimanual motions. Our results underscore the critical roles of data quality and model scale in generation performance, and we release the dataset publicly to foster future research in this domain.
📝 Abstract
Synthesizing human motion has advanced rapidly, yet realistic hand motion and bimanual interaction remain underexplored. Whole-body models often miss the fine-grained cues that drive dexterous behavior, finger articulation, contact timing, and inter-hand coordination, and existing resources lack high-fidelity bimanual sequences that capture nuanced finger dynamics and collaboration. To fill this gap, we present HandX, a unified foundation spanning data, annotation, and evaluation. We consolidate and filter existing datasets for quality, and collect a new motion-capture dataset targeting underrepresented bimanual interactions with detailed finger dynamics. For scalable annotation, we introduce a decoupled strategy that extracts representative motion features, e.g., contact events and finger flexion, and then leverages reasoning from large language models to produce fine-grained, semantically rich descriptions aligned with these features. Building on the resulting data and annotations, we benchmark diffusion and autoregressive models with versatile conditioning modes. Experiments demonstrate high-quality dexterous motion generation, supported by our newly proposed hand-focused metrics. We further observe clear scaling trends: larger models trained on larger, higher-quality datasets produce more semantically coherent bimanual motion. Our dataset is released to support future research.
Problem

Research questions and friction points this paper is trying to address.

bimanual interaction
hand motion
dexterous behavior
finger dynamics
motion synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

bimanual interaction
hand motion generation
decoupled annotation
large language models
motion capture dataset
🔎 Similar Papers
No similar papers found.
Z
Zimu Zhang
University of Illinois Urbana-Champaign
Yucheng Zhang
Yucheng Zhang
Purdue University
Knowledge GraphLarge Language Models
X
Xiyan Xu
University of Illinois Urbana-Champaign
Z
Ziyin Wang
University of Illinois Urbana-Champaign
Sirui Xu
Sirui Xu
University of Illinois at Urbana-Champaign
Computer VisionMachine LearningVirtual HumansCharacter AnimationHuman-Object Interaction
K
Kai Zhou
Specs Inc.
Bing Zhou
Bing Zhou
Snap Research
Human Motion GenerationVideo GenerationHuman Computer Interaction
C
Chuan Guo
Snap Inc.
Jian Wang
Jian Wang
Snap Inc.
Computer visionsignal processing
Y
Yu-Xiong Wang
University of Illinois Urbana-Champaign
L
Liang-Yan Gui
University of Illinois Urbana-Champaign