🤖 AI Summary
To address the challenge of automatic perioperative event recognition in operating rooms (ORs), where privacy constraints prohibit the use of raw RGB video, this paper proposes a privacy-preserving digital twin (DT) paradigm. First, a vision foundation model is employed to losslessly transform raw RGB videos into de-identified depth maps and semantic segmentation maps. Second, a novel SafeOR dual-stream fusion network is designed to perform high-accuracy event detection directly on these DT representations. This work introduces the first two-stage privacy-preserving framework integrating *generative DT modeling* with *dual-stream detection*. Evaluated on 38 simulated surgical cases, the method achieves detection performance on par with—or even surpassing—that of models trained on raw video (mAP improvement of 2.3%). Critically, the DT representations are inherently privacy-safe and enable secure cross-institutional data sharing, substantially enhancing model generalizability.
📝 Abstract
Purpose: The operating room (OR) is a complex environment where optimizing workflows is critical to reduce costs and improve patient outcomes. The use of computer vision approaches for the automatic recognition of perioperative events enables identification of bottlenecks for OR optimization. However, privacy concerns limit the use of computer vision for automated event detection from OR videos, which makes privacy-preserving approaches needed for OR workflow analysis. Methods: We propose a two-stage pipeline for privacy-preserving OR video analysis and event detection. In the first stage, we leverage vision foundation models for depth estimation and semantic segmentation to generate de-identified Digital Twins (DT) of the OR from conventional RGB videos. In the second stage, we employ the SafeOR model, a fused two-stream approach that processes segmentation masks and depth maps for OR event detection. We evaluate this method on an internal dataset of 38 simulated surgical trials with five event classes. Results: Our results indicate that this DT-based approach to the OR event detection model achieves performance on par and sometimes even better than raw RGB video-based models on detecting OR events. Conclusion: DTs enable privacy-preserving OR workflow analysis, facilitating the sharing of de-identified data across institutions and they can potentially enhance model generalizability by mitigating domain-specific appearance differences.