FUSION: Full-Body Unified Motion Prior for Body and Hands via Diffusion

πŸ“… 2026-01-07
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing full-body motion synthesis methods often neglect hand movements or generate motions only under constrained scenarios, lacking large-scale, diverse datasets that jointly capture both body and fine-grained hand articulation. To address this gap, this work integrates multi-source hand and body motion data into unified full-body motion sequences and proposes FUSION, the first unconditional diffusion-based motion prior for full-body (including fingers) synthesis. FUSION enables fine-grained interactive motions driven by either object trajectories or natural language constraints generated by large language models. Experiments demonstrate that FUSION outperforms state-of-the-art skeleton-aware control models on the HumanML3D keypoint tracking task, producing more natural motions while achieving high-precision hand control and coherent full-body coordination in both object interaction and self-interaction tasks.

Technology Category

Application Category

πŸ“ Abstract
Hands are central to interacting with our surroundings and conveying gestures, making their inclusion essential for full-body motion synthesis. Despite this, existing human motion synthesis methods fall short: some ignore hand motions entirely, while others generate full-body motions only for narrowly scoped tasks under highly constrained settings. A key obstacle is the lack of large-scale datasets that jointly capture diverse full-body motion with detailed hand articulation. While some datasets capture both, they are limited in scale and diversity. Conversely, large-scale datasets typically focus either on body motion without hands or on hand motions without the body. To overcome this, we curate and unify existing hand motion datasets with large-scale body motion data to generate full-body sequences that capture both hand and body. We then propose the first diffusion-based unconditional full-body motion prior, FUSION, which jointly models body and hand motion. Despite using a pose-based motion representation, FUSION surpasses state-of-the-art skeletal control models on the Keypoint Tracking task in the HumanML3D dataset and achieves superior motion naturalness. Beyond standard benchmarks, we demonstrate that FUSION can go beyond typical uses of motion priors through two applications: (1) generating detailed full-body motion including fingers during interaction given the motion of an object, and (2) generating Self-Interaction motions using an LLM to transform natural language cues into actionable motion constraints. For these applications, we develop an optimization pipeline that refines the latent space of our diffusion model to generate task-specific motions. Experiments on these tasks highlight precise control over hand motion while maintaining plausible full-body coordination. The code will be public.
Problem

Research questions and friction points this paper is trying to address.

full-body motion synthesis
hand motion modeling
motion prior
dataset limitation
human motion generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion model
full-body motion synthesis
hand articulation
motion prior
latent space optimization