🤖 AI Summary
Reconstructing dynamic scenes involving multiple people and objects from sparse viewpoints poses significant challenges due to severe occlusions and the complexity of modeling intricate interactions. This work proposes MM-GS, a novel framework that extends 3D Gaussian Splatting to such scenarios for the first time. It introduces a hierarchical representation comprising an instance-wise multi-view fusion module to enforce cross-view consistency and a scene-level instance interaction module that reasons about participant relationships on a global scene graph to jointly refine geometric and appearance attributes. Evaluated on several challenging datasets, MM-GS substantially outperforms existing methods, achieving high-fidelity detail reconstruction and physically plausible object contact effects.
📝 Abstract
Reconstructing dynamic scenes with multiple interacting humans and objects from sparse-view inputs is a critical yet challenging task, essential for creating high-fidelity digital twins for robotics and VR/AR. This problem, which we term Multi-Human Multi-Object (MHMO) rendering, presents two significant obstacles: achieving view-consistent representations for individual instances under severe mutual occlusion, and explicitly modeling the complex and combinatorial dependencies that arise from their interactions. To overcome these challenges, we propose MM-GS, a novel hierarchical framework built upon 3D Gaussian Splatting. Our method first employs a Per-Instance Multi-View Fusion module to establish a robust and consistent representation for each instance by aggregating visual information across all available views. Subsequently, a Scene-Level Instance Interaction module operates on a global scene graph to reason about relationships between all participants, refining their attributes to capture subtle interaction effects. Extensive experiments on challenging datasets demonstrate that our method significantly outperforms strong baselines, producing state-of-the-art results with high-fidelity details and plausible inter-instance contacts.