Patient4D: Temporally Consistent Patient Body Mesh Recovery from Monocular Operating Room Video

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant performance degradation of existing monocular 3D human mesh reconstruction methods in surgical settings, where severe occlusions, viewpoint variations, and domain shifts are prevalent. To tackle these challenges, we propose Patient4D, the first approach to explicitly incorporate a patient staticity prior through Pose Locking and Rigid Fallback mechanisms that enforce inter-frame temporal consistency. Our method integrates foundation vision models, pose parameter anchoring, and silhouette-guided rigid alignment, and is compatible with existing reconstruction frameworks. Evaluated on 4,680 synthetic surgical sequences and three public benchmarks, Patient4D achieves an average IoU of 0.75 under occlusion and reduces the failure frame rate from 30.5% to 1.3%, substantially improving robustness and stability.

Technology Category

Application Category

📝 Abstract
Recovering a dense 3D body mesh from monocular video remains challenging under occlusion from draping and continuously moving camera viewpoints. This configuration arises in surgical augmented reality (AR), where an anesthetized patient lies under surgical draping while a surgeon's head-mounted camera continuously changes viewpoint. Existing human mesh recovery (HMR) methods are typically trained on upright, moving subjects captured from relatively stable cameras, leading to performance degradation under such conditions. To address this, we present Patient4D, a stationarity-constrained reconstruction pipeline that explicitly exploits the stationarity prior. The pipeline combines image-level foundation models for perception with lightweight geometric mechanisms that enforce temporal consistency across frames. Two key components enable robust reconstruction: Pose Locking, which anchors pose parameters using stable keyframes, and Rigid Fallback, which recovers meshes under severe occlusion through silhouette-guided rigid alignment. Together, these mechanisms stabilize predictions while remaining compatible with off-the-shelf HMR models. We evaluate Patient4D on 4,680 synthetic surgical sequences and three public HMR video benchmarks. Under surgical drape occlusion, Patient4D achieves a 0.75 mean IoU, reducing failure frames from 30.5% to 1.3% compared to the best baseline. Our findings demonstrate that exploiting stationarity priors can substantially improve monocular reconstruction in clinical AR scenarios.
Problem

Research questions and friction points this paper is trying to address.

monocular video
occlusion
temporal consistency
surgical augmented reality
3D body mesh recovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

stationarity prior
temporal consistency
monocular 3D reconstruction
surgical augmented reality
occlusion-robust mesh recovery