🤖 AI Summary
This work addresses the high computational cost and latency of existing machine-assisted glottis detection systems, which hinder their applicability in emergency nasotracheal intubation (NTI) scenarios demanding real-time performance and low resource consumption. To this end, the authors propose Mobile GlottisNet, a lightweight framework that integrates an adaptive feature decoupling module, a hierarchical dynamic thresholding strategy, and a cross-layer dynamic weighted fusion mechanism. The design further incorporates deformable convolutions and dynamic sample assignment to achieve robust and accurate glottis localization under complex anatomical conditions. With a model size of only 5 MB, Mobile GlottisNet achieves 62 FPS on-device inference on both PID and clinical datasets, and 33 FPS on edge platforms, effectively balancing accuracy and efficiency for deployment in resource-constrained emergency NTI settings.
📝 Abstract
Nasotracheal intubation (NTI) is a vital procedure in emergency airway management, where rapid and accurate glottis detection is essential to ensure patient safety. However, existing machine assisted visual detection systems often rely on high performance computational resources and suffer from significant inference delays, which limits their applicability in time critical and resource constrained scenarios. To overcome these limitations, we propose Mobile GlottisNet, a lightweight and efficient glottis detection framework designed for real time inference on embedded and edge devices. The model incorporates structural awareness and spatial alignment mechanisms, enabling robust glottis localization under complex anatomical and visual conditions. We implement a hierarchical dynamic thresholding strategy to enhance sample assignment, and introduce an adaptive feature decoupling module based on deformable convolution to support dynamic spatial reconstruction. A cross layer dynamic weighting scheme further facilitates the fusion of semantic and detail features across multiple scales. Experimental results demonstrate that the model, with a size of only 5MB on both our PID dataset and Clinical datasets, achieves inference speeds of over 62 FPS on devices and 33 FPS on edge platforms, showing great potential in the application of emergency NTI.