🤖 AI Summary
This work addresses the challenge of multi-view 3D tracking in operating rooms, where inaccurate camera calibration and RGB-D registration often induce geometric inconsistencies across views, leading to fusion artifacts (“ghosting”) and degraded trajectory accuracy in a shared coordinate system. To mitigate this, the authors propose a two-stage decoupled approach: first, a multi-view metric geometry correction module transforms imprecise calibrations into a globally scale-consistent geometric alignment; second, occlusion-robust 3D point tracking is performed within the unified world coordinate frame. By decoupling geometric consistency correction from tracking—a novel strategy in this domain—the method significantly enhances both fusion stability and tracking precision. Evaluated on the MM-OR benchmark, the correction front-end reduces cross-view depth inconsistency by over 30×, demonstrating the critical role of geometric consistency in improving tracking performance.
📝 Abstract
In operating rooms (OR), world-scale multi-view 3D tracking supports downstream applications such as surgeon behavior recognition, where physically meaningful quantities such as distances and motion statistics must be measured in meters. However, real clinical deployments rarely satisfy the geometric prerequisites for stable multi-view fusion and tracking: camera calibration and RGB-D registration are always unreliable, leading to cross-view geometric inconsistency that produces "ghosting" during fusion and degrades 3D trajectories in a shared OR coordinate frame. To address this, we introduce Geometry OR Tracker, a two-stage pipeline that first rectifies imprecise calibration into a scaleconsistent and geometrically consistent camera setup with a single global scale via a Multi-view Metric Geometry Rectification module, and then performs Occlusion-Robust 3D Point Tracking directly in the unified OR world frame. On the MM-OR benchmark, improved geometric consistency translates into tracking gains: our rectification front-end reduces cross-view depth disagreement by more than 30$\times$ compared to raw calibration. Ablation studies further demonstrate the relationship between calibration quality and tracking accuracy, showing that improved geometric consistency yields stronger world-frame tracking.