🤖 AI Summary
Empirical modeling of clinical surgical team collaboration remains underexplored; existing methods fail to capture the multimodal, highly interdependent, and low-resource interaction dynamics inherent in real-world operating rooms. Method: We introduce the first natural multimodal dialogue dataset for surgical team collaboration reflection, comprising synchronized audio transcripts, dual-view video, and simulated physiological signals, all behaviorally annotated using an authoritative team collaboration framework. Contribution/Results: We propose the first multimodal evaluation benchmark explicitly designed for label imbalance, strong cross-modal coupling, and high ecological validity. Systematic evaluation reveals that state-of-the-art large language models suffer significant performance degradation under realistic conditions—including noise, asynchrony, and sparse annotations. We publicly release the full dataset, source code, and evaluation toolkit, establishing a new paradigm and foundational resource for advancing team collaboration understanding in medical AI.
📝 Abstract
In clinical operations, teamwork can be the crucial factor that determines the final outcome. Prior studies have shown that sufficient collaboration is the key factor that determines the outcome of an operation. To understand how the team practices teamwork during the operation, we collected CliniDial from simulations of medical operations. CliniDial includes the audio data and its transcriptions, the simulated physiology signals of the patient manikins, and how the team operates from two camera angles. We annotate behavior codes following an existing framework to understand the teamwork process for CliniDial. We pinpoint three main characteristics of our dataset, including its label imbalances, rich and natural interactions, and multiple modalities, and conduct experiments to test existing LLMs' capabilities on handling data with these characteristics. Experimental results show that CliniDial poses significant challenges to the existing models, inviting future effort on developing methods that can deal with real-world clinical data. We open-source the codebase at https://github.com/MichiganNLP/CliniDial