🤖 AI Summary
To address the low efficiency and weak situational awareness of manual 2D video review in high-risk operational scenarios (e.g., disaster response, military exercises), this paper proposes ACT360—a system for automated action detection and structured post-hoc analysis using 360° video. Methodologically, it introduces 360YOWO, the first 360° action detection model integrating spatial attention with equi-rectangular-aware convolution (EAC); employs a lightweight deployment strategy (pruning + quantization), reducing model size by 74% with only a marginal mAP drop of 1.5% (from 0.865 to 0.850); and develops LLM-driven 360AIE, an interactive review interface supporting natural-language summarization and web-based visualization. Evaluated on 55 real-world 360° videos covering seven critical action classes, ACT360 significantly improves both review efficiency and analytical depth.
📝 Abstract
Effective training and debriefing are critical in high-stakes, mission-critical environments such as disaster response, military simulations, and industrial safety, where precision and minimizing errors are paramount. The traditional post-training analysis relies on manually reviewing 2D videos, a time-consuming process that lacks comprehensive situational awareness. To address these limitations, we introduce ACT360, a system that leverages 360-degree videos and machine learning for automated action detection and structured debriefing. ACT360 integrates 360YOWO, an enhanced You Only Watch Once (YOWO) model with spatial attention and equirectangular-aware convolution (EAC) to mitigate panoramic video distortions. To enable deployment in resource-constrained environments, we apply quantization and model pruning, reducing the model size by 74% while maintaining robust accuracy (mAP drop of only 1.5%, from 0.865 to 0.850) and improving inference speed. We validate our approach on a publicly available dataset of 55 labeled 360-degree videos covering seven key operational actions, recorded across various real-world training sessions and environmental conditions. Additionally, ACT360 integrates 360AIE (Action Insight Explorer), a web-based interface for automatic action detection, retrieval, and textual summarization using large language models (LLMs), significantly enhancing post-incident analysis efficiency. ACT360 serves as a generalized framework for mission-critical debriefing, incorporating EAC, spatial attention, summarization, and model optimization. These innovations apply to any training environment requiring lightweight action detection and structured post-exercise analysis.