🤖 AI Summary
This work addresses the significant performance degradation of existing monocular 3D human mesh reconstruction methods in surgical settings, where severe occlusions, viewpoint variations, and domain shifts are prevalent. To tackle these challenges, we propose Patient4D, the first approach to explicitly incorporate a patient staticity prior through Pose Locking and Rigid Fallback mechanisms that enforce inter-frame temporal consistency. Our method integrates foundation vision models, pose parameter anchoring, and silhouette-guided rigid alignment, and is compatible with existing reconstruction frameworks. Evaluated on 4,680 synthetic surgical sequences and three public benchmarks, Patient4D achieves an average IoU of 0.75 under occlusion and reduces the failure frame rate from 30.5% to 1.3%, substantially improving robustness and stability.
📝 Abstract
Recovering a dense 3D body mesh from monocular video remains challenging under occlusion from draping and continuously moving camera viewpoints. This configuration arises in surgical augmented reality (AR), where an anesthetized patient lies under surgical draping while a surgeon's head-mounted camera continuously changes viewpoint. Existing human mesh recovery (HMR) methods are typically trained on upright, moving subjects captured from relatively stable cameras, leading to performance degradation under such conditions. To address this, we present Patient4D, a stationarity-constrained reconstruction pipeline that explicitly exploits the stationarity prior. The pipeline combines image-level foundation models for perception with lightweight geometric mechanisms that enforce temporal consistency across frames. Two key components enable robust reconstruction: Pose Locking, which anchors pose parameters using stable keyframes, and Rigid Fallback, which recovers meshes under severe occlusion through silhouette-guided rigid alignment. Together, these mechanisms stabilize predictions while remaining compatible with off-the-shelf HMR models. We evaluate Patient4D on 4,680 synthetic surgical sequences and three public HMR video benchmarks. Under surgical drape occlusion, Patient4D achieves a 0.75 mean IoU, reducing failure frames from 30.5% to 1.3% compared to the best baseline. Our findings demonstrate that exploiting stationarity priors can substantially improve monocular reconstruction in clinical AR scenarios.