Link to the Past: Temporal Propagation for Fast 3D Human Reconstruction from Monocular Video

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of real-time monocular video-driven 3D dressed human reconstruction. Methodologically, it introduces a novel temporal propagation mechanism that reformulates pixel-aligned reconstruction networks into a streaming video processing paradigm; designs an updateable canonical appearance representation to enforce inter-frame consistency and enable lightweight fine-tuning; and integrates time-aware feature propagation, canonical-space coordinate mapping, and joint NeRF/implicit surface modeling. The contributions are threefold: (1) significantly improved inference speed—up to 12 FPS—without per-video optimization; (2) preservation of high-fidelity geometry and texture quality; and (3) state-of-the-art performance on standard benchmarks, demonstrating strong generalization across challenging poses and diverse clothing types.

Technology Category

Application Category

📝 Abstract
Fast 3D clothed human reconstruction from monocular video remains a significant challenge in computer vision, particularly in balancing computational efficiency with reconstruction quality. Current approaches are either focused on static image reconstruction but too computationally intensive, or achieve high quality through per-video optimization that requires minutes to hours of processing, making them unsuitable for real-time applications. To this end, we present TemPoFast3D, a novel method that leverages temporal coherency of human appearance to reduce redundant computation while maintaining reconstruction quality. Our approach is a"plug-and play"solution that uniquely transforms pixel-aligned reconstruction networks to handle continuous video streams by maintaining and refining a canonical appearance representation through efficient coordinate mapping. Extensive experiments demonstrate that TemPoFast3D matches or exceeds state-of-the-art methods across standard metrics while providing high-quality textured reconstruction across diverse pose and appearance, with a maximum speed of 12 FPS.
Problem

Research questions and friction points this paper is trying to address.

Fast 3D human reconstruction from monocular video
Balancing computational efficiency with reconstruction quality
Real-time applications with high-quality textured reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages temporal coherency for efficient 3D reconstruction
Plug-and-play solution for continuous video streams
Maintains canonical appearance via efficient coordinate mapping
🔎 Similar Papers
No similar papers found.