🤖 AI Summary
Dynamic human-object interaction reconstruction from monocular video suffers from structural incompleteness and temporal jitter due to mutual occlusion and temporal inconsistency. Method: This paper proposes a template-free, temporally aware unoccluded completion framework. Its core innovation is the first explicit incorporation of cross-frame consistency constraints into dynamic interaction modeling, jointly optimizing depth estimation, temporal feature propagation, and an unoccluded reasoning network to drive end-to-end 3D Gaussian Splatting reconstruction. The method requires no prior human or object templates and achieves temporally stable geometric and appearance completion in occluded regions. Results: Evaluated on multiple challenging monocular video sequences, our approach significantly improves detail recovery accuracy under occlusion and inter-frame continuity. It outperforms state-of-the-art methods in both reconstruction quality and temporal stability.
📝 Abstract
We introduce a novel framework for reconstructing dynamic human-object interactions from monocular video that overcomes challenges associated with occlusions and temporal inconsistencies. Traditional 3D reconstruction methods typically assume static objects or full visibility of dynamic subjects, leading to degraded performance when these assumptions are violated-particularly in scenarios where mutual occlusions occur. To address this, our framework leverages amodal completion to infer the complete structure of partially obscured regions. Unlike conventional approaches that operate on individual frames, our method integrates temporal context, enforcing coherence across video sequences to incrementally refine and stabilize reconstructions. This template-free strategy adapts to varying conditions without relying on predefined models, significantly enhancing the recovery of intricate details in dynamic scenes. We validate our approach using 3D Gaussian Splatting on challenging monocular videos, demonstrating superior precision in handling occlusions and maintaining temporal stability compared to existing techniques.