Patient4D: Temporally Consistent Patient Body Mesh Recovery from Monocular Operating Room Video

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the significant performance degradation of existing monocular 3D human mesh reconstruction methods in surgical settings, where severe occlusions, viewpoint variations, and domain shifts are prevalent. To tackle these challenges, we propose Patient4D, the first approach to explicitly incorporate a patient staticity prior through Pose Locking and Rigid Fallback mechanisms that enforce inter-frame temporal consistency. Our method integrates foundation vision models, pose parameter anchoring, and silhouette-guided rigid alignment, and is compatible with existing reconstruction frameworks. Evaluated on 4,680 synthetic surgical sequences and three public benchmarks, Patient4D achieves an average IoU of 0.75 under occlusion and reduces the failure frame rate from 30.5% to 1.3%, substantially improving robustness and stability.

Technology Category

Application Category

📝 Abstract

Recovering a dense 3D body mesh from monocular video remains challenging under occlusion from draping and continuously moving camera viewpoints. This configuration arises in surgical augmented reality (AR), where an anesthetized patient lies under surgical draping while a surgeon's head-mounted camera continuously changes viewpoint. Existing human mesh recovery (HMR) methods are typically trained on upright, moving subjects captured from relatively stable cameras, leading to performance degradation under such conditions. To address this, we present Patient4D, a stationarity-constrained reconstruction pipeline that explicitly exploits the stationarity prior. The pipeline combines image-level foundation models for perception with lightweight geometric mechanisms that enforce temporal consistency across frames. Two key components enable robust reconstruction: Pose Locking, which anchors pose parameters using stable keyframes, and Rigid Fallback, which recovers meshes under severe occlusion through silhouette-guided rigid alignment. Together, these mechanisms stabilize predictions while remaining compatible with off-the-shelf HMR models. We evaluate Patient4D on 4,680 synthetic surgical sequences and three public HMR video benchmarks. Under surgical drape occlusion, Patient4D achieves a 0.75 mean IoU, reducing failure frames from 30.5% to 1.3% compared to the best baseline. Our findings demonstrate that exploiting stationarity priors can substantially improve monocular reconstruction in clinical AR scenarios.

Problem

Research questions and friction points this paper is trying to address.

monocular video

occlusion

temporal consistency

surgical augmented reality

3D body mesh recovery

Innovation

Methods, ideas, or system contributions that make the work stand out.

stationarity prior

temporal consistency

monocular 3D reconstruction

surgical augmented reality

occlusion-robust mesh recovery

🔎 Similar Papers

Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Dynamic Scenes

2024-04-18Citations: 0

DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos

2023-03-23Citations: 0

Authors to Follow