3D Whole-body Grasp Synthesis with Directional Controllability

📅 2024-08-29
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of generating physically plausible, direction-controllable 3D full-body grasping motions for animation, mixed reality, and robotics. Existing methods struggle to jointly model hand–object–scene (e.g., container) interactions, suffering from uncontrolled grasp orientation, scene penetration, and inefficient optimization. To overcome these limitations, we propose the first generative framework incorporating early-stage geometric reasoning: (1) ray-casting and collision detection to model reachable grasp directions; (2) unified constraints on arm and palm orientation; (3) symmetric left/right-hand grasp synthesis; and (4) probabilistic directional sampling, geometry-aware conditional generation, and contact-consistent full-body optimization. Our method achieves significant improvements over state-of-the-art on GRAB and ReplicaGrasp benchmarks—yielding higher physical plausibility, success rates, faster inference, and lower computational overhead. Ablation studies confirm consistent gains from each component. Code and models will be publicly released.

Technology Category

Application Category

📝 Abstract
Synthesizing 3D whole-bodies that realistically grasp objects is useful for animation, mixed reality, and robotics. This is challenging, because the hands and body need to look natural w.r.t. each other, the grasped object, as well as the local scene (i.e., a receptacle supporting the object). Only recent work tackles this, with a divide-and-conquer approach; it first generates a"guiding"right-hand grasp, and then searches for bodies that match this. However, the guiding-hand synthesis lacks controllability and receptacle awareness, so it likely has an implausible direction (i.e., a body can't match this without penetrating the receptacle) and needs corrections through major post-processing. Moreover, the body search needs exhaustive sampling and is expensive. These are strong limitations. We tackle these with a novel method called CWGrasp. Our key idea is that performing geometry-based reasoning"early on,"instead of"too late,"provides rich"control"signals for inference. To this end, CWGrasp first samples a plausible reaching-direction vector (used later for both the arm and hand) from a probabilistic model built via raycasting from the object and collision checking. Then, it generates a reaching body with a desired arm direction, as well as a"guiding"grasping hand with a desired palm direction that complies with the arm's one. Eventually, CWGrasp refines the body to match the"guiding"hand, while plausibly contacting the scene. Notably, generating already-compatible"parts"greatly simplifies the"whole."Moreover, CWGrasp uniquely tackles both right- and left-hand grasps. We evaluate on the GRAB and ReplicaGrasp datasets. CWGrasp outperforms baselines, at lower runtime and budget, while all components help performance. Code and models will be released.
Problem

Research questions and friction points this paper is trying to address.

Synthesizing realistic 3D whole-body grasps for animation and robotics.
Addressing lack of controllability in guiding-hand synthesis methods.
Reducing exhaustive sampling and computational costs in body search.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-based early reasoning
Probabilistic model sampling
Bilateral hand grasp synthesis
🔎 Similar Papers
No similar papers found.
Georgios Paschalidis
Georgios Paschalidis
University of Amsterdam, the Netherlands
R
Romana Wilschut
University of Amsterdam, the Netherlands
D
Dimitrije Anti'c
University of Amsterdam, the Netherlands
Omid Taheri
Omid Taheri
Max Planck Institute for Intelligent Systems
Human-Object InteractionComputer VisionDeep LearningMachine LearningMotion Tracking
D
Dimitrios Tzionas
University of Amsterdam, the Netherlands