Chain-of-Anomaly Thoughts with Large Vision-Language Models

📅 2025-12-23

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Large vision-language models (VLMs) suffer from inherent normalcy bias in video surveillance, hindering reliable detection of anomalous behaviors such as criminal acts; moreover, their reasoning lacks inductive bias toward anomalies, leading to systematic false negatives. To address this, we propose CoAT—a multi-agent chain-of-thought anomaly reasoning framework—that introduces, for the first time, an explicit inductive crime bias mechanism at the end of the reasoning chain and designs an anomaly-focused classification layer to counteract normalcy bias. CoAT integrates multi-agent collaborative reasoning, chain-of-thought (CoT) expansion, and anomaly-aware vision-language modeling. Experiments demonstrate that CoAT improves anomaly detection F1-score by 11.8 percentage points on low-resolution videos and boosts anomaly classification accuracy by 3.78 percentage points on high-resolution videos. This work is the first to explicitly embed inductive anomaly bias into the VLM reasoning chain, significantly enhancing sensitivity to and discriminative capability for criminal events.

Technology Category

Application Category

📝 Abstract

Automated video surveillance with Large Vision-Language Models is limited by their inherent bias towards normality, often failing to detect crimes. While Chain-of-Thought reasoning strategies show significant potential for improving performance in language tasks, the lack of inductive anomaly biases in their reasoning further steers the models towards normal interpretations. To address this, we propose Chain-of-Anomaly-Thoughts (CoAT), a multi-agent reasoning framework that introduces inductive criminal bias in the reasoning process through a final, anomaly-focused classification layer. Our method significantly improves Anomaly Detection, boosting F1-score by 11.8 p.p. on challenging low-resolution footage and Anomaly Classification by 3.78 p.p. in high-resolution videos.

Problem

Research questions and friction points this paper is trying to address.

Detects anomalies in automated video surveillance

Addresses bias towards normality in vision-language models

Improves anomaly detection and classification accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent reasoning framework introduces inductive criminal bias

Final anomaly-focused classification layer enhances detection

Significantly improves F1-score in low-resolution video surveillance

🔎 Similar Papers

Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection