Missiongnn: Hierarchical Multimodal GNN-Based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation

📅 2024-06-27
🏛️ IEEE Workshop/Winter Conference on Applications of Computer Vision
📈 Citations: 7
Influential: 1
📄 PDF
🤖 AI Summary
To address the challenges of scarce abnormal samples, high cost of frame-level annotations, and difficulty in weakly supervised modeling for video anomaly detection (VAD) and recognition (VAR), this paper proposes a hierarchical multimodal graph neural network framework. Methodologically: (1) it introduces the first task-specific knowledge graph auto-generation mechanism, leveraging large language models (LLMs) to construct structured domain priors; (2) it designs a lightweight weakly supervised paradigm that avoids backpropagating gradients through LLMs, substantially reducing training overhead; and (3) it enables end-to-end full-frame modeling, overcoming conventional segment-level constraints. Evaluated on mainstream benchmarks—including UCF-Crime and XD—the framework achieves significant improvements in both VAD and VAR performance while maintaining real-time inference efficiency. Extensive experiments demonstrate its strong generalization capability and practical value in intelligent surveillance and violent incident early warning applications.

Technology Category

Application Category

📝 Abstract
In the context of escalating safety concerns across various domains, the tasks of Video Anomaly Detection (VAD) and Video Anomaly Recognition (VAR) have emerged as critically important for applications in intelligent surveillance, evidence investigation, violence alerting, etc. These tasks, aimed at identifying and classifying deviations from normal behavior in video data, face significant challenges due to the rarity of anomalies which leads to extremely imbalanced data and the impracticality of extensive framelevel data annotation for supervised learning. This paper introduces a novel hierarchical graph neural network (GNN) based model Missiongnn that addresses these challenges by leveraging a state-of-the-art large language model and a comprehensive knowledge graph for efficient weakly supervised learning in VAR. Our approach circumvents the limitations of previous methods by avoiding heavy gradient computations on large multimodal models and enabling fully frame-level training without fixed video segmentation. Utilizing automated, mission-specific knowledge graph generation, our model provides a practical and efficient solution for real-time video analysis without the constraints of previous segmentation-based or multimodal approaches. Experimental validation on benchmark datasets demonstrates our model's performance in VAD and VAR, highlighting its potential to redefine the landscape of anomaly detection and recognition in video surveillance systems.
Problem

Research questions and friction points this paper is trying to address.

Addresses video anomaly recognition with imbalanced data
Overcomes limitations of heavy multimodal model computations
Enables real-time analysis without fixed video segmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical GNN for weakly supervised learning
Mission-specific knowledge graph generation
Frame-level training without fixed segmentation
🔎 Similar Papers
No similar papers found.