ChatAD: Reasoning-Enhanced Time-Series Anomaly Detection with Multi-Turn Instruction Evolution

πŸ“… 2026-01-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of current large language models (LLMs) in time series anomaly detection, particularly their weak reasoning capabilities, lack of multi-turn dialogue mechanisms, and poor generalization. To overcome these challenges, the authors propose the Multi-Agent Time Series Evolution algorithm (TSEvol), which generates a high-quality multi-turn dialogue dataset, TSEData-20K, to train a specialized model named ChatAD. Additionally, they introduce the TKTO optimization strategy, inspired by Kahneman and Tversky’s cognitive theories, to enhance cross-task generalization. The study also presents LLADBench, the first LLM-driven benchmark for anomaly detection evaluation. Experimental results demonstrate that ChatAD achieves up to 34.50% and 34.71% improvements in accuracy and F1 score, respectively, reduces false alarm rates by 37.42%, and exhibits strong generalization across diverse tasks including classification, forecasting, and imputation.

Technology Category

Application Category

πŸ“ Abstract
LLM-driven Anomaly Detection (AD) helps enhance the understanding and explanatory abilities of anomalous behaviors in Time Series (TS). Existing methods face challenges of inadequate reasoning ability, deficient multi-turn dialogue capability, and narrow generalization. To this end, we 1) propose a multi-agent-based TS Evolution algorithm named TSEvol. On top of it, we 2) introduce the AD reasoning and multi-turn dialogue Dataset TSEData-20K and contribute the Chatbot family for AD, including ChatAD-Llama3-8B, Qwen2.5-7B, and Mistral-7B. Furthermore, 3) we propose the TS Kahneman-Tversky Optimization (TKTO) to enhance ChatAD's cross-task generalization capability. Lastly, 4) we propose a LLM-driven Learning-based AD Benchmark LLADBench to evaluate the performance of ChatAD and nine baselines across seven datasets and tasks. Our three ChatAD models achieve substantial gains, up to 34.50% in accuracy, 34.71% in F1, and a 37.42% reduction in false positives. Besides, via KTKO, our optimized ChatAD achieves competitive performance in reasoning and cross-task generalization on classification, forecasting, and imputation.
Problem

Research questions and friction points this paper is trying to address.

Time-Series Anomaly Detection
Reasoning Ability
Multi-Turn Dialogue
Generalization
LLM-driven Anomaly Detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent time-series evolution
reasoning-enhanced anomaly detection
multi-turn dialogue dataset
cross-task generalization
LLM-driven benchmark
πŸ”Ž Similar Papers
No similar papers found.