Diffgrasp: Whole-Body Grasping Synthesis Guided by Object Motion Using a Diffusion Model

📅 2024-12-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of jointly ensuring grasp plausibility, full-body coordination, and motion naturalness in whole-body human motion synthesis involving dynamic object interaction, this paper proposes the first end-to-end diffusion-based generative framework that jointly models full-body pose, dexterous hand grasping, and object trajectories. Methodologically, we introduce a contact-aware loss function and a data-driven motion guidance mechanism, integrating geometric contact constraints, kinematic priors, and object pose-conditioned encoding. Compared to state-of-the-art methods, our approach achieves significant improvements in three key aspects: physical plausibility of grasps, spatiotemporal coherence among body, hands, and object, and temporal smoothness of motion sequences. Quantitative evaluations—including Contact F1, Jitter, and Fréchet Inception Distance (FID)—and qualitative visual results consistently demonstrate superior performance.

Technology Category

Application Category

📝 Abstract
Generating high-quality whole-body human object interaction motion sequences is becoming increasingly important in various fields such as animation, VR/AR, and robotics. The main challenge of this task lies in determining the level of involvement of each hand given the complex shapes of objects in different sizes and their different motion trajectories, while ensuring strong grasping realism and guaranteeing the coordination of movement in all body parts. Contrasting with existing work, which either generates human interaction motion sequences without detailed hand grasping poses or only models a static grasping pose, we propose a simple yet effective framework that jointly models the relationship between the body, hands, and the given object motion sequences within a single diffusion model. To guide our network in perceiving the object's spatial position and learning more natural grasping poses, we introduce novel contact-aware losses and incorporate a data-driven, carefully designed guidance. Experimental results demonstrate that our approach outperforms the state-of-the-art method and generates plausible whole-body motion sequences.
Problem

Research questions and friction points this paper is trying to address.

Human-Object Interaction
Motion Synthesis
Natural Grasping
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffgrasp
Diffusion Model
Dynamic Grasping
🔎 Similar Papers
No similar papers found.
Yonghao Zhang
Yonghao Zhang
Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Q
Qiang He
Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Y
Yanguang Wan
Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Yinda Zhang
Yinda Zhang
Google Research
Computer VisionComputer GraphicsDeep LearningScene UnderstandingDigital Human
Xiaoming Deng
Xiaoming Deng
Institute of Software, CAS
Computer VisionRobotic ManipulationNatural User InterfacesVirtual HumansHand Tracking
C
Cuixia Ma
Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences
H
Hongan Wang
Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences