🤖 AI Summary
Existing Gaussian splatting methods struggle to accurately model the complex, interdependent interactions between humans and objects in dynamic scenes. To address this limitation, this work proposes HOIGS, the first approach within the Gaussian splatting framework to explicitly model coupling deformations induced by human-object interactions. HOIGS leverages a cross-attention mechanism to capture interaction relationships and employs heterogeneous dynamic representations—HexPlane for humans and cubic Hermite splines (CHS) for objects—to account for their distinct deformation characteristics. This formulation effectively handles challenging interaction scenarios such as occlusions, physical contact, and object manipulation. Extensive experiments demonstrate that HOIGS significantly outperforms state-of-the-art human-centric and 4D Gaussian methods across multiple datasets, achieving higher-fidelity reconstruction of dynamic human-object interactions.
📝 Abstract
Reconstructing dynamic scenes with complex human-object interactions is a fundamental challenge in computer vision and graphics. Existing Gaussian Splatting methods either rely on human pose priors while neglecting dynamic objects, or approximate all motions within a single field, limiting their ability to capture interaction-rich dynamics. To address this gap, we propose Human-Object Interaction Gaussian Splatting (HOIGS), which explicitly models interaction-induced deformation between humans and objects through a cross-attention-based HOI module. Distinct deformation baselines are employed to extract features: HexPlane for humans and Cubic Hermite Spline (CHS) for objects. By integrating these heterogeneous features, HOIGS effectively captures interdependent motions and improves deformation estimation in scenarios involving occlusion, contact, and object manipulation. Comprehensive experiments on multiple datasets demonstrate that our method consistently outperforms state-of-the-art human-centric and 4D Gaussian approaches, highlighting the importance of explicitly modeling human-object interactions for high-fidelity reconstruction.