🤖 AI Summary
This study addresses the challenge of analyzing fine-grained collaborative behaviors in face-to-face hands-on learning without wearable devices, using only a single ceiling-mounted camera. Leveraging nursing simulation videos and seven instructor-defined observable behavior categories, the authors train a YOLO object detection model augmented with spatial context modeling for multimodal behavioral analysis. They demonstrate, for the first time, that co-located learning engagement and team performance can be effectively assessed using solely single-view visual data. The model achieves an mAP@0.5 of 0.827 on the test set. Behavioral analysis reveals that high-performing teams predominantly focus on the primary workspace and patient interaction, whereas low-performing teams exhibit dispersed activity in secondary areas and frequent smartphone use, underscoring the critical role of joint behavior–space analysis in understanding collaboration quality.
📝 Abstract
This study examined whether a single ceiling-mounted camera could be used to capture fine-grained learning behaviours in co-located practical learning. In undergraduate nursing simulations, teachers first identified seven observable behaviour categories, which were then used to train a YOLO-based detector. Video data were collected from 52 sessions, and analyses focused on Scenario A because it produced greater behavioural variation than Scenario B. Annotation reliability was high (F1=0.933). On the held-out test set, the model achieved a precision of 0.789, a recall of 0.784, and an mAP@0.5 of 0.827. When only behaviour frequencies were compared, no robust differences were found between high- and low-performing groups. However, when behaviour labels were analysed together with spatial context, clear differences emerged in both task and collaboration performance. Higher-performing teams showed more patient interaction in the primary work area, whereas lower-performing teams showed more phone-related activity and more activity in secondary areas. These findings suggest that behavioural data are more informative when interpreted together with where they occur. Overall, the study shows that a single-camera computer vision approach can support the analysis of teamwork and task engagement in face-to-face practical learning without relying on wearable sensors.