Decoupled Generative Modeling for Human-Object Interaction Synthesis

📅 2025-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing HOI synthesis methods rely on manually specified waypoints and couple trajectory planning with action generation in a single network, leading to motion asynchrony and interpenetration artifacts between humans and objects. This paper proposes a two-stage decoupled paradigm: first, a trajectory generator jointly predicts waypoint-free motion trajectories for both human bodies and objects; second, an action generator synthesizes high-fidelity joint motions conditioned on these trajectories. We innovatively introduce a distal-joint-dynamics-aware adversarial discriminator to enhance contact realism and model a moving reference frame to support long-horizon, responsive planning and cross-frame consistency in dynamic scenes. Our approach achieves comprehensive state-of-the-art performance on the FullBodyManipulation and 3D-FUTURE benchmarks, with significant improvements in quantitative metrics, qualitative results, and user study evaluations.

Technology Category

Application Category

📝 Abstract
Synthesizing realistic human-object interaction (HOI) is essential for 3D computer vision and robotics, underpinning animation and embodied control. Existing approaches often require manually specified intermediate waypoints and place all optimization objectives on a single network, which increases complexity, reduces flexibility, and leads to errors such as unsynchronized human and object motion or penetration. To address these issues, we propose Decoupled Generative Modeling for Human-Object Interaction Synthesis (DecHOI), which separates path planning and action synthesis. A trajectory generator first produces human and object trajectories without prescribed waypoints, and an action generator conditions on these paths to synthesize detailed motions. To further improve contact realism, we employ adversarial training with a discriminator that focuses on the dynamics of distal joints. The framework also models a moving counterpart and supports responsive, long-sequence planning in dynamic scenes, while preserving plan consistency. Across two benchmarks, FullBodyManipulation and 3D-FUTURE, DecHOI surpasses prior methods on most quantitative metrics and qualitative evaluations, and perceptual studies likewise prefer our results.
Problem

Research questions and friction points this paper is trying to address.

Synthesizes realistic human-object interactions without manual waypoints.
Separates path planning and action synthesis to reduce errors.
Improves contact realism with adversarial training on joint dynamics.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples path planning and action synthesis
Uses adversarial training for contact realism
Models moving counterpart for dynamic scenes
🔎 Similar Papers
No similar papers found.