🤖 AI Summary
Current EMS systems lack real-time multimodal (audio-video) intelligent analysis capabilities prior to paramedic arrival, resulting in manual, inefficient, and error-prone dispatch and on-scene response. To address this, we propose TeleEMS—a pre-arrival multimodal reasoning framework. First, we introduce EMSLlama, the first large language model specifically designed for EMS, achieving 89% symptom-matching accuracy—significantly outperforming GPT-4o (57%). Second, we design PreNet, a joint text–vital-sign prediction model enabling automated emergency protocol selection and medication recommendation. Third, we integrate remote photoplethysmography (rPPG)-based video heart-rate estimation, symptom semantic extraction, and low-latency EMS-Stream multi-party video transmission. Implemented on an edge–mobile collaborative architecture, TeleEMS delivers robust, low-latency intelligent decision support. Evaluation demonstrates substantial improvements in the timeliness and reliability of prehospital interventions.
📝 Abstract
Timely and accurate pre-arrival video streaming and analytics are critical for emergency medical services (EMS) to deliver life-saving interventions. Yet, current-generation EMS infrastructure remains constrained by one-to-one video streaming and limited analytics capabilities, leaving dispatchers and EMTs to manually interpret overwhelming, often noisy or redundant information in high-stress environments. We present TeleEMS, a mobile live video analytics system that enables pre-arrival multimodal inference by fusing audio and video into a unified decision-making pipeline before EMTs arrive on scene.
TeleEMS comprises two key components: TeleEMS Client and TeleEMS Server. The TeleEMS Client runs across phones, smart glasses, and desktops to support bystanders, EMTs en route, and 911 dispatchers. The TeleEMS Server, deployed at the edge, integrates EMS-Stream, a communication backbone that enables smooth multi-party video streaming. On top of EMSStream, the server hosts three real-time analytics modules: (1) audio-to-symptom analytics via EMSLlama, a domain-specialized LLM for robust symptom extraction and normalization; (2) video-to-vital analytics using state-of-the-art rPPG methods for heart rate estimation; and (3) joint text-vital analytics via PreNet, a multimodal multitask model predicting EMS protocols, medication types, medication quantities, and procedures.
Evaluation shows that EMSLlama outperforms GPT-4o (exact-match 0.89 vs. 0.57) and that text-vital fusion improves inference robustness, enabling reliable pre-arrival intervention recommendations. TeleEMS demonstrates the potential of mobile live video analytics to transform EMS operations, bridging the gap between bystanders, dispatchers, and EMTs, and paving the way for next-generation intelligent EMS infrastructure.