Video Summarisation with Incident and Context Information using Generative AI

πŸ“… 2025-01-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the low efficiency of large-scale surveillance video analysis and weak capabilities in event summarization and contextual description, this paper proposes a semantic-driven fine-grained video summarization method. The approach uniquely integrates YOLOv8 for object detection with Google’s Gemini multimodal large language model, leveraging frame-level semantic alignment and prompt engineering to support user-defined queries and generate precise textual summaries focused on critical events and their contextual backgrounds. Unlike conventional generic summarization or isolated action recognition methods, ours significantly enhances contextual awareness and interpretability. Experimental results demonstrate a semantic similarity score of 72.8% (measured via BERTScore) and a qualitative accuracy of 85%. Moreover, the method substantially improves event localization efficiency and achieves a high rate of human review substitution, thereby enabling scalable and actionable video analytics.

Technology Category

Application Category

πŸ“ Abstract
The proliferation of video content production has led to vast amounts of data, posing substantial challenges in terms of analysis efficiency and resource utilization. Addressing this issue calls for the development of robust video analysis tools. This paper proposes a novel approach leveraging Generative Artificial Intelligence (GenAI) to facilitate streamlined video analysis. Our tool aims to deliver tailored textual summaries of user-defined queries, offering a focused insight amidst extensive video datasets. Unlike conventional frameworks that offer generic summaries or limited action recognition, our method harnesses the power of GenAI to distil relevant information, enhancing analysis precision and efficiency. Employing YOLO-V8 for object detection and Gemini for comprehensive video and text analysis, our solution achieves heightened contextual accuracy. By combining YOLO with Gemini, our approach furnishes textual summaries extracted from extensive CCTV footage, enabling users to swiftly navigate and verify pertinent events without the need for exhaustive manual review. The quantitative evaluation revealed a similarity of 72.8%, while the qualitative assessment rated an accuracy of 85%, demonstrating the capability of the proposed method.
Problem

Research questions and friction points this paper is trying to address.

Video Analysis
Efficiency
Event Summarization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative AI
YOLO-V8
Video Summarization
πŸ”Ž Similar Papers
No similar papers found.