MOGRAS: Human Motion with Grasping in 3D Scenes

📅 2025-10-25

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Existing methods for generating full-body motion in 3D scenes struggle to simultaneously ensure physical plausibility and fine-grained grasp fidelity: scene-aware models neglect hand-object interaction, while grasp-specific models disregard environmental context. To address this, we introduce MOGRAS—the first large-scale benchmark featuring walking trajectories, multi-view grasp poses, and semantically annotated 3D scenes. We further propose a scene-adaptive generative framework that jointly optimizes full-body motion and object grasping by integrating kinematic constraints with geometric and functional scene cues. Our approach significantly improves physical plausibility—reducing interpenetration collisions by 42.7%—and enhances visual realism, achieving a +38.5% user preference rate in perceptual studies. Quantitative evaluations and qualitative analyses consistently demonstrate superiority over state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Generating realistic full-body motion interacting with objects is critical for applications in robotics, virtual reality, and human-computer interaction. While existing methods can generate full-body motion within 3D scenes, they often lack the fidelity for fine-grained tasks like object grasping. Conversely, methods that generate precise grasping motions typically ignore the surrounding 3D scene. This gap, generating full-body grasping motions that are physically plausible within a 3D scene, remains a significant challenge. To address this, we introduce MOGRAS (Human MOtion with GRAsping in 3D Scenes), a large-scale dataset that bridges this gap. MOGRAS provides pre-grasping full-body walking motions and final grasping poses within richly annotated 3D indoor scenes. We leverage MOGRAS to benchmark existing full-body grasping methods and demonstrate their limitations in scene-aware generation. Furthermore, we propose a simple yet effective method to adapt existing approaches to work seamlessly within 3D scenes. Through extensive quantitative and qualitative experiments, we validate the effectiveness of our dataset and highlight the significant improvements our proposed method achieves, paving the way for more realistic human-scene interactions.

Problem

Research questions and friction points this paper is trying to address.

Generate physically plausible full-body grasping motions

Bridge the gap between scene interaction and fine-grained tasks

Address limitations in scene-aware human motion generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging large-scale annotated motion dataset

Adapting existing methods for 3D scenes

Generating physically plausible grasping motions

🔎 Similar Papers

GoalGrasp: Grasping Goals in Partially Occluded Scenarios without Grasp Training

2024-05-08arXiv.orgCitations: 0

Omnigrasp: Grasping Diverse Objects with Simulated Humanoids

2024-07-16Neural Information Processing SystemsCitations: 16

TikTok

San Jose, California

Research Scientist Intern, Machine Perception for Input and Interaction (PhD)