🤖 AI Summary
This study addresses the challenge of automatically localizing critical team communication events—specifically Time-out and StOP? protocols—in operating rooms to enhance patient safety and intraoperative intelligent assistance. Existing approaches suffer from a lack of multi-view, fine-grained temporal annotations and models tailored to collaborative group dynamics. To bridge this gap, we introduce Team-OR, the first multi-view surgical team activity dataset with precise start-and-end time annotations for key communication events. We further propose a scene-action joint encoding neural network that jointly models contextual scene cues and collective motion patterns for end-to-end temporal action detection. Evaluated on Team-OR, our method significantly outperforms state-of-the-art temporal action detection models, demonstrating its effectiveness. This work establishes both a foundational benchmark dataset and a novel modeling framework, thereby filling critical gaps in fine-grained data curation and algorithmic modeling of surgical team collaboration behavior.
📝 Abstract
Purpose: Surgical performance depends not only on surgeons' technical skills but also on team communication within and across the different professional groups present during the operation. Therefore, automatically identifying team communication in the OR is crucial for patient safety and advances in the development of computer-assisted surgical workflow analysis and intra-operative support systems. To take the first step, we propose a new task of detecting communication briefings involving all OR team members, i.e. the team Time-out and the StOP?-protocol, by localizing their start and end times in video recordings of surgical operations. Methods: We generate an OR dataset of real surgeries, called Team-OR, with more than one hundred hours of surgical videos captured by the multi-view camera system in the OR. The dataset contains temporal annotations of 33 Time-out and 22 StOP?-protocol activities in total. We then propose a novel group activity detection approach, where we encode both scene context and action features, and use an efficient neural network model to output the results. Results: The experimental results on the Team-OR dataset show that our approach outperforms existing state-of-the-art temporal action detection approaches. It also demonstrates the lack of research on group activities in the OR, proving the significance of our dataset. Conclusion: We investigate the Team Time-Out and the StOP?-protocol in the OR, by presenting the first OR dataset with temporal annotations of group activities protocols, and introducing a novel group activity detection approach that outperforms existing approaches. Code is available at https://github.com/CAMMA-public/Team-OR .