MOGRAS: Human Motion with Grasping in 3D Scenes

📅 2025-10-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for generating full-body motion in 3D scenes struggle to simultaneously ensure physical plausibility and fine-grained grasp fidelity: scene-aware models neglect hand-object interaction, while grasp-specific models disregard environmental context. To address this, we introduce MOGRAS—the first large-scale benchmark featuring walking trajectories, multi-view grasp poses, and semantically annotated 3D scenes. We further propose a scene-adaptive generative framework that jointly optimizes full-body motion and object grasping by integrating kinematic constraints with geometric and functional scene cues. Our approach significantly improves physical plausibility—reducing interpenetration collisions by 42.7%—and enhances visual realism, achieving a +38.5% user preference rate in perceptual studies. Quantitative evaluations and qualitative analyses consistently demonstrate superiority over state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Generating realistic full-body motion interacting with objects is critical for applications in robotics, virtual reality, and human-computer interaction. While existing methods can generate full-body motion within 3D scenes, they often lack the fidelity for fine-grained tasks like object grasping. Conversely, methods that generate precise grasping motions typically ignore the surrounding 3D scene. This gap, generating full-body grasping motions that are physically plausible within a 3D scene, remains a significant challenge. To address this, we introduce MOGRAS (Human MOtion with GRAsping in 3D Scenes), a large-scale dataset that bridges this gap. MOGRAS provides pre-grasping full-body walking motions and final grasping poses within richly annotated 3D indoor scenes. We leverage MOGRAS to benchmark existing full-body grasping methods and demonstrate their limitations in scene-aware generation. Furthermore, we propose a simple yet effective method to adapt existing approaches to work seamlessly within 3D scenes. Through extensive quantitative and qualitative experiments, we validate the effectiveness of our dataset and highlight the significant improvements our proposed method achieves, paving the way for more realistic human-scene interactions.
Problem

Research questions and friction points this paper is trying to address.

Generate physically plausible full-body grasping motions
Bridge the gap between scene interaction and fine-grained tasks
Address limitations in scene-aware human motion generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging large-scale annotated motion dataset
Adapting existing methods for 3D scenes
Generating physically plausible grasping motions
🔎 Similar Papers
No similar papers found.
K
Kunal Bhosikar
Machine Learning Lab, International Institute of Information Technology, Hyderabad, India
Siddharth Katageri
Siddharth Katageri
Machine Learning Lab, International Institute of Information Technology, Hyderabad, India
V
Vivek Madhavaram
Machine Learning Lab, International Institute of Information Technology, Hyderabad, India
K
Kai Han
Vision AI Lab, The University of Hong Kong, Pokfulam Road, Hong Kong
Charu Sharma
Charu Sharma
International Institute of Information Technology Hyderabad
Geometric Machine LearningPoint CloudsGraph Representation LearningOptimal Transport