🤖 AI Summary
This study addresses the challenges of casualty evacuation and frontline triage in high-risk conflict zones, where access is limited and conditions are hazardous. The authors propose a human–machine collaborative, multimodal decision support system that, for the first time, integrates behavioral cues from unmanned aerial vehicle (UAV) video with physiological signals from wearable sensors to enable remote battlefield triage. To mitigate the scarcity of injury-related motion data, they introduce a conditional variational autoencoder to enhance data realism and design a lightweight CNN-based visual encoder for efficient processing of UAV footage. Evaluated on a newly curated dataset, the system achieves an action classification accuracy of 85.7%, with its visual module performing comparably to powerful pretrained models, thereby significantly improving the feasibility and accuracy of triage in high-threat environments.
📝 Abstract
At a time when drones are increasingly associated with hostile operations, we re-purpose them for humanitarian and life-saving applications. However, adapting search and rescue drones for battlefield triage remains extremely challenging; the technology must perform reliably to support frontline medics who are forced to operate under extreme uncertainty, restricted access, and significant personal risk. Due to growing vulnerabilities of casualty evacuation in conflicting zones, this paper presents ATRACT (A Trustworthy Robotic Autonomous system to support Casualty Triage), a novel human-in-the-loop decision support system to enable early battlefield triage during the critical post-trauma period. ATRACT integrates drone-captured video with wearable sensor input for multi-modal learning to support casualty-state assessment, thereby addressing the limitations of existing systems. Drone video captures fine-grained behavioural cues, such as pose, posture, while body-worn sensors provide complementary physiological signals, including heart rate, breathing rate, and movement. By combining two modalities, ATRACT provides evidence to support the early judgement of medics when direct access to the casualty is delayed, risky, or restricted. To mitigate the data realism gap pertaining to injured actions, a conditional variational autoencoder is devised for data augmentation. Experimental results on our drone captured dataset show that proposed pipeline achieves 85.7% accuracy for action classification; while our lightweight CNN visual encoder remains competitive with stronger pre-trained video backbones. Overall, the results support ATRACT as a practically meaningful step towards remote triage in contested environments, where multi-modal sensing, human oversight and trustworthy decision support can improve casualty prioritisation, and lessen the exposure of frontline medics.