HOSIG: Full-Body Human-Object-Scene Interaction Generation with Hierarchical Scene Perception

πŸ“… 2025-06-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

194K/year
πŸ€– AI Summary
Generating high-fidelity full-body human-object-scene cooperative interactions remains challenging due to interpenetration artifacts, inaccurate obstacle avoidance, and imbalance between fine-grained manipulation and long-horizon navigation. This paper introduces the first end-to-end human-object-scene co-generation framework: (1) scene-aware grasp pose generation guided by local geometric constraints to eliminate human-object interpenetration; (2) compressed 2D map–driven dual-component spatial navigation that jointly optimizes local manipulation and global path planning; and (3) a motion diffusion model integrating spatial anchors with dual-space classifier-free guidance, enabling finger-level precision, collision-free execution, and long-horizon trajectory synthesis. Evaluated on TRUMANS, our method significantly outperforms state-of-the-art approaches, supports arbitrarily long sequence generation, and requires only minimal human intervention.

Technology Category

Application Category

πŸ“ Abstract
Generating high-fidelity full-body human interactions with dynamic objects and static scenes remains a critical challenge in computer graphics and animation. Existing methods for human-object interaction often neglect scene context, leading to implausible penetrations, while human-scene interaction approaches struggle to coordinate fine-grained manipulations with long-range navigation. To address these limitations, we propose HOSIG, a novel framework for synthesizing full-body interactions through hierarchical scene perception. Our method decouples the task into three key components: 1) a scene-aware grasp pose generator that ensures collision-free whole-body postures with precise hand-object contact by integrating local geometry constraints, 2) a heuristic navigation algorithm that autonomously plans obstacle-avoiding paths in complex indoor environments via compressed 2D floor maps and dual-component spatial reasoning, and 3) a scene-guided motion diffusion model that generates trajectory-controlled, full-body motions with finger-level accuracy by incorporating spatial anchors and dual-space classifier-free guidance. Extensive experiments on the TRUMANS dataset demonstrate superior performance over state-of-the-art methods. Notably, our framework supports unlimited motion length through autoregressive generation and requires minimal manual intervention. This work bridges the critical gap between scene-aware navigation and dexterous object manipulation, advancing the frontier of embodied interaction synthesis. Codes will be available after publication. Project page: http://yw0208.github.io/hosig
Problem

Research questions and friction points this paper is trying to address.

Generating realistic full-body human interactions with dynamic objects and static scenes
Addressing scene context neglect in human-object interaction methods
Coordinating fine-grained manipulations with long-range navigation in human-scene interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scene-aware grasp pose generator for collision-free postures
Heuristic navigation algorithm for obstacle-avoiding paths
Scene-guided motion diffusion model for trajectory-controlled motions
πŸ”Ž Similar Papers
No similar papers found.