GASPACHO: Gaussian Splatting for Controllable Humans and Objects

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of controllable rendering for human-object interaction (HOI) scenes from multi-view RGB images. We propose the first deformable 3D Gaussian representation that jointly models both drivable humans and objects. Unlike prior methods reconstructing only the human body, our approach simultaneously learns Gaussian parameters for both human and object in a canonical space, incorporating linear deformation constraints and occlusion-aware photometric loss to enable robust, decoupled reconstruction and editing under severe occlusion. Key technical contributions include: (1) manifold-constrained learning of canonical Gaussian parameters; (2) feature-guided construction of object UV templates; (3) multi-view geometric consistency regularization; and (4) differentiable 3D Gaussian splatting rendering. Evaluated on BEHAVE and DNA-Rendering benchmarks, our method significantly improves reconstruction fidelity under occlusion and enhances the quality of novel-pose–novel-view HOI synthesis, enabling fine-grained, interactive control over human-object interactions.

Technology Category

Application Category

📝 Abstract
We present GASPACHO: a method for generating photorealistic controllable renderings of human-object interactions. Given a set of multi-view RGB images of human-object interactions, our method reconstructs animatable templates of the human and object as separate sets of Gaussians simultaneously. Different from existing work, which focuses on human reconstruction and ignores objects as background, our method explicitly reconstructs both humans and objects, thereby allowing for controllable renderings of novel human object interactions in different poses from novel-camera viewpoints. During reconstruction, we constrain the Gaussians that generate rendered images to be a linear function of a set of canonical Gaussians. By simply changing the parameters of the linear deformation functions after training, our method can generate renderings of novel human-object interaction in novel poses from novel camera viewpoints. We learn the 3D Gaussian properties of the canonical Gaussians on the underlying 2D manifold of the canonical human and object templates. This in turns requires a canonical object template with a fixed UV unwrapping. To define such an object template, we use a feature based representation to track the object across the multi-view sequence. We further propose an occlusion aware photometric loss that allows for reconstructions under significant occlusions. Several experiments on two human-object datasets - BEHAVE and DNA-Rendering - demonstrate that our method allows for high-quality reconstruction of human and object templates under significant occlusion and the synthesis of controllable renderings of novel human-object interactions in novel human poses from novel camera views.
Problem

Research questions and friction points this paper is trying to address.

Reconstructs animatable human and object templates from multi-view RGB images.
Generates controllable renderings of novel human-object interactions in new poses.
Handles significant occlusions using occlusion-aware photometric loss.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconstructs humans and objects as separate Gaussians
Uses linear deformation for novel pose rendering
Implements occlusion-aware photometric loss
🔎 Similar Papers
No similar papers found.
A
Aymen Mir
Huawei Noah’s Ark Lab, London; University of Tübingen, Germany
Arthur Moreau
Arthur Moreau
Huawei Noah's Ark London
3D computer visionneural renderingvirtual humansvisual localizationpose estimation
Helisa Dhamo
Helisa Dhamo
Researcher in Computer Vision
computer visiondeep learninglayered depth imagesscene graphs
Z
Zhensong Zhang
Huawei Noah’s Ark Lab, London
E
Eduardo P'erez-Pellitero
Huawei Noah’s Ark Lab, London