Guiding Human-Object Interactions with Rich Geometry and Relations

📅 2025-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing HOI synthesis methods often simplify object representations using centroids or nearest points, neglecting geometric details and leading to physically implausible interactions and inaccurate relational modeling. To address this, we propose ROG, a diffusion-based framework tailored for high-fidelity interactive scenarios such as VR. First, we introduce boundary-focused keypoint sampling—coupled with an Interaction Distance Field (IDF)—to explicitly encode spatial constraints between humans and objects. Second, we design a spatiotemporal joint attention mechanism that jointly grounds action generation in relational semantics and geometric consistency. Evaluated on standard HOI synthesis benchmarks, ROG significantly outperforms state-of-the-art methods, achieving substantial improvements in physical plausibility, geometric fidelity, and semantic accuracy of synthesized human–object interactions.

Technology Category

Application Category

📝 Abstract
Human-object interaction (HOI) synthesis is crucial for creating immersive and realistic experiences for applications such as virtual reality. Existing methods often rely on simplified object representations, such as the object's centroid or the nearest point to a human, to achieve physically plausible motions. However, these approaches may overlook geometric complexity, resulting in suboptimal interaction fidelity. To address this limitation, we introduce ROG, a novel diffusion-based framework that models the spatiotemporal relationships inherent in HOIs with rich geometric detail. For efficient object representation, we select boundary-focused and fine-detail key points from the object mesh, ensuring a comprehensive depiction of the object's geometry. This representation is used to construct an interactive distance field (IDF), capturing the robust HOI dynamics. Furthermore, we develop a diffusion-based relation model that integrates spatial and temporal attention mechanisms, enabling a better understanding of intricate HOI relationships. This relation model refines the generated motion's IDF, guiding the motion generation process to produce relation-aware and semantically aligned movements. Experimental evaluations demonstrate that ROG significantly outperforms state-of-the-art methods in the realism and semantic accuracy of synthesized HOIs.
Problem

Research questions and friction points this paper is trying to address.

Enhancing human-object interaction realism with detailed geometry
Overcoming simplified object representation limitations in HOI synthesis
Improving motion generation with relation-aware spatial-temporal modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Boundary-focused key points for object representation
Interactive distance field capturing HOI dynamics
Diffusion-based relation model with attention mechanisms
🔎 Similar Papers
No similar papers found.
M
Mengqing Xue
South China University of Technology
Y
Yifei Liu
South China University of Technology
L
Ling Guo
South China University of Technology
Shaoli Huang
Shaoli Huang
Tencent AI-Lab
Deep learningComputer Vision
Changxing Ding
Changxing Ding
Professor@South China University of Technology
Computer VisionEmbodied AI