ChatAD: Reasoning-Enhanced Time-Series Anomaly Detection with Multi-Turn Instruction Evolution

📅 2026-01-20

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the limitations of current large language models (LLMs) in time series anomaly detection, particularly their weak reasoning capabilities, lack of multi-turn dialogue mechanisms, and poor generalization. To overcome these challenges, the authors propose the Multi-Agent Time Series Evolution algorithm (TSEvol), which generates a high-quality multi-turn dialogue dataset, TSEData-20K, to train a specialized model named ChatAD. Additionally, they introduce the TKTO optimization strategy, inspired by Kahneman and Tversky’s cognitive theories, to enhance cross-task generalization. The study also presents LLADBench, the first LLM-driven benchmark for anomaly detection evaluation. Experimental results demonstrate that ChatAD achieves up to 34.50% and 34.71% improvements in accuracy and F1 score, respectively, reduces false alarm rates by 37.42%, and exhibits strong generalization across diverse tasks including classification, forecasting, and imputation.

Technology Category

Application Category

📝 Abstract

LLM-driven Anomaly Detection (AD) helps enhance the understanding and explanatory abilities of anomalous behaviors in Time Series (TS). Existing methods face challenges of inadequate reasoning ability, deficient multi-turn dialogue capability, and narrow generalization. To this end, we 1) propose a multi-agent-based TS Evolution algorithm named TSEvol. On top of it, we 2) introduce the AD reasoning and multi-turn dialogue Dataset TSEData-20K and contribute the Chatbot family for AD, including ChatAD-Llama3-8B, Qwen2.5-7B, and Mistral-7B. Furthermore, 3) we propose the TS Kahneman-Tversky Optimization (TKTO) to enhance ChatAD's cross-task generalization capability. Lastly, 4) we propose a LLM-driven Learning-based AD Benchmark LLADBench to evaluate the performance of ChatAD and nine baselines across seven datasets and tasks. Our three ChatAD models achieve substantial gains, up to 34.50% in accuracy, 34.71% in F1, and a 37.42% reduction in false positives. Besides, via KTKO, our optimized ChatAD achieves competitive performance in reasoning and cross-task generalization on classification, forecasting, and imputation.

Problem

Research questions and friction points this paper is trying to address.

Time-Series Anomaly Detection

Reasoning Ability

Multi-Turn Dialogue

Generalization

LLM-driven Anomaly Detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent time-series evolution

reasoning-enhanced anomaly detection

multi-turn dialogue dataset