VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs

📅 2024-06-14
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Existing video large multimodal models (Video-LMMs) have advanced significantly in general video understanding, yet their capability for fine-grained anomaly detection remains systematically unassessed. To address this gap, we introduce VAD-Bench—the first dedicated, fine-grained video anomaly detection benchmark for Video-LMMs—encompassing real-world scenarios including deepfakes, traffic accidents, and criminal activities. We innovatively define five categories of synthetic anomalies (e.g., unnatural deformations, object emergence/disappearance), integrate real-world datasets (e.g., UCF-Crime, ShanghaiTech), and generate high-fidelity anomalous videos using state-of-the-art text-to-video diffusion models. We propose a visual question answering–driven unified evaluation paradigm. Comprehensive evaluation across nine open- and closed-source Video-LMMs reveals consistently poor performance on subtle anomalies. This work establishes a standardized evaluation framework, and publicly releases the benchmark, code, and data to facilitate principled advancement in Video-LMM robustness and reasoning.

Technology Category

Application Category

📝 Abstract
The recent developments in Large Multi-modal Video Models (Video-LMMs) have significantly enhanced our ability to interpret and analyze video data. Despite their impressive capabilities, current Video-LMMs have not been evaluated for anomaly detection tasks, which is critical to their deployment in practical scenarios e.g., towards identifying deepfakes, manipulated video content, traffic accidents and crimes. In this paper, we introduce VANE-Bench, a benchmark designed to assess the proficiency of Video-LMMs in detecting and localizing anomalies and inconsistencies in videos. Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models, encompassing a variety of subtle anomalies and inconsistencies grouped into five categories: unnatural transformations, unnatural appearance, pass-through, disappearance and sudden appearance. Additionally, our benchmark features real-world samples from existing anomaly detection datasets, focusing on crime-related irregularities, atypical pedestrian behavior, and unusual events. The task is structured as a visual question-answering challenge to gauge the models' ability to accurately detect and localize the anomalies within the videos. We evaluate nine existing Video-LMMs, both open and closed sources, on this benchmarking task and find that most of the models encounter difficulties in effectively identifying the subtle anomalies. In conclusion, our research offers significant insights into the current capabilities of Video-LMMs in the realm of anomaly detection, highlighting the importance of our work in evaluating and improving these models for real-world applications. Our code and data is available at https://hananshafi.github.io/vane-benchmark/
Problem

Research questions and friction points this paper is trying to address.

Evaluating Video-LMMs for anomaly detection in videos
Assessing models' ability to detect and localize subtle anomalies
Benchmarking performance on synthetic and real-world anomaly datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces VANE-Bench for anomaly detection evaluation
Uses synthetic and real-world video anomaly datasets
Evaluates Video-LMMs via visual question-answering tasks
🔎 Similar Papers
No similar papers found.