🤖 AI Summary
This work addresses the challenges of physically implausible reconstructions and inaccurate contact modeling in 3D human–object interaction (HOI) estimation. We propose a diffusion-based joint optimization framework that iteratively refines human pose and object pose estimates under physical and contact constraints. Specifically, our method employs a diffusion prior guided by image observations and object geometry, integrating physics-aware losses—including gravity compliance and collision avoidance—with contact-aware terms. Through controllable score-guided sampling and constraint-driven denoising, it jointly optimizes plausible distributions over both human and object states in an end-to-end differentiable manner. Our key innovation lies in the deep coupling of diffusion generative priors with explicit physical and contact modeling. Extensive experiments on PROX and BEHAVE benchmarks demonstrate significant improvements over state-of-the-art methods in reconstruction accuracy and physical plausibility.
📝 Abstract
Joint reconstruction of human-object interaction marks a significant milestone in comprehending the intricate interrelations between humans and their surrounding environment. Nevertheless, previous optimization methods often struggle to achieve physically plausible reconstruction results due to the lack of prior knowledge about human-object interactions. In this paper, we introduce ScoreHOI, an effective diffusion-based optimizer that introduces diffusion priors for the precise recovery of human-object interactions. By harnessing the controllability within score-guided sampling, the diffusion model can reconstruct a conditional distribution of human and object pose given the image observation and object feature. During inference, the ScoreHOI effectively improves the reconstruction results by guiding the denoising process with specific physical constraints. Furthermore, we propose a contact-driven iterative refinement approach to enhance the contact plausibility and improve the reconstruction accuracy. Extensive evaluations on standard benchmarks demonstrate ScoreHOI's superior performance over state-of-the-art methods, highlighting its ability to achieve a precise and robust improvement in joint human-object interaction reconstruction.