π€ AI Summary
This work addresses the limitations of current large language models (LLMs) in time series anomaly detection, particularly their weak reasoning capabilities, lack of multi-turn dialogue mechanisms, and poor generalization. To overcome these challenges, the authors propose the Multi-Agent Time Series Evolution algorithm (TSEvol), which generates a high-quality multi-turn dialogue dataset, TSEData-20K, to train a specialized model named ChatAD. Additionally, they introduce the TKTO optimization strategy, inspired by Kahneman and Tverskyβs cognitive theories, to enhance cross-task generalization. The study also presents LLADBench, the first LLM-driven benchmark for anomaly detection evaluation. Experimental results demonstrate that ChatAD achieves up to 34.50% and 34.71% improvements in accuracy and F1 score, respectively, reduces false alarm rates by 37.42%, and exhibits strong generalization across diverse tasks including classification, forecasting, and imputation.
π Abstract
LLM-driven Anomaly Detection (AD) helps enhance the understanding and explanatory abilities of anomalous behaviors in Time Series (TS). Existing methods face challenges of inadequate reasoning ability, deficient multi-turn dialogue capability, and narrow generalization. To this end, we 1) propose a multi-agent-based TS Evolution algorithm named TSEvol. On top of it, we 2) introduce the AD reasoning and multi-turn dialogue Dataset TSEData-20K and contribute the Chatbot family for AD, including ChatAD-Llama3-8B, Qwen2.5-7B, and Mistral-7B. Furthermore, 3) we propose the TS Kahneman-Tversky Optimization (TKTO) to enhance ChatAD's cross-task generalization capability. Lastly, 4) we propose a LLM-driven Learning-based AD Benchmark LLADBench to evaluate the performance of ChatAD and nine baselines across seven datasets and tasks. Our three ChatAD models achieve substantial gains, up to 34.50% in accuracy, 34.71% in F1, and a 37.42% reduction in false positives. Besides, via KTKO, our optimized ChatAD achieves competitive performance in reasoning and cross-task generalization on classification, forecasting, and imputation.