Temporally Consistent Amodal Completion for 3D Human-Object Interaction Reconstruction

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Dynamic human-object interaction reconstruction from monocular video suffers from structural incompleteness and temporal jitter due to mutual occlusion and temporal inconsistency. Method: This paper proposes a template-free, temporally aware unoccluded completion framework. Its core innovation is the first explicit incorporation of cross-frame consistency constraints into dynamic interaction modeling, jointly optimizing depth estimation, temporal feature propagation, and an unoccluded reasoning network to drive end-to-end 3D Gaussian Splatting reconstruction. The method requires no prior human or object templates and achieves temporally stable geometric and appearance completion in occluded regions. Results: Evaluated on multiple challenging monocular video sequences, our approach significantly improves detail recovery accuracy under occlusion and inter-frame continuity. It outperforms state-of-the-art methods in both reconstruction quality and temporal stability.

Technology Category

Application Category

📝 Abstract

We introduce a novel framework for reconstructing dynamic human-object interactions from monocular video that overcomes challenges associated with occlusions and temporal inconsistencies. Traditional 3D reconstruction methods typically assume static objects or full visibility of dynamic subjects, leading to degraded performance when these assumptions are violated-particularly in scenarios where mutual occlusions occur. To address this, our framework leverages amodal completion to infer the complete structure of partially obscured regions. Unlike conventional approaches that operate on individual frames, our method integrates temporal context, enforcing coherence across video sequences to incrementally refine and stabilize reconstructions. This template-free strategy adapts to varying conditions without relying on predefined models, significantly enhancing the recovery of intricate details in dynamic scenes. We validate our approach using 3D Gaussian Splatting on challenging monocular videos, demonstrating superior precision in handling occlusions and maintaining temporal stability compared to existing techniques.

Problem

Research questions and friction points this paper is trying to address.

Reconstructing dynamic human-object interactions from monocular video

Overcoming occlusions and temporal inconsistencies in 3D reconstruction

Inferring complete structure of partially obscured regions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Amodal completion for occluded regions

Temporal coherence across video sequences

Template-free 3D Gaussian Splatting

🔎 Similar Papers

No similar papers found.