Detecting Time Series Anomalies Like an Expert: A Multi-Agent LLM Framework with Specialized Analyzers

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Current large language models for time series anomaly detection suffer from limited controllability, interpretability, and reliability due to their reliance on a single, generic architecture that struggles to capture complex anomaly patterns. To address this, this work proposes SAGE, a novel multi-agent framework specifically designed for time series anomaly detection through task decomposition. SAGE introduces four specialized analyzers—focusing on point, structural, seasonal, and pattern anomalies—that leverage numerical tools and visualization to generate evidence. A detector agent synthesizes this evidence into structured anomaly records with confidence scores, while a supervisor agent produces diagnostic reports. Notably, SAGE requires no real anomaly labels, achieving expert-level diagnosis using only synthetic contextual examples. Experiments demonstrate that SAGE significantly outperforms state-of-the-art machine learning, deep learning, and language model baselines across three benchmark datasets, with ablation studies and human evaluations confirming its detection reliability and diagnostic utility.

📝 Abstract

Recent studies have explored large language models for time-series anomaly detection, yet existing approaches often rely on a single general-purpose model to directly infer anomaly indices or intervals, limiting controllability, interpretability, and reliability for complex anomaly patterns. We propose SAGE (Specialized Analyzer Group for Expert-like Detection), a multi-agent framework for structured anomaly diagnosis in univariate time series. It decomposes anomaly analysis into four specialized Analyzers for point, structural, seasonal, and pattern anomalies. Each Analyzer applies family-specific numerical tools and diagnostic visualizations to generate evidence, while an evidence-grounded Detector consolidates the evidence into confidence-scored anomaly records with intervals and candidate types. A Supervisor then converts these structured records into analyst-facing diagnostic reports. SAGE further constructs synthetic in-context examples from normal-reference training segments, without using real anomalous segments or anomaly-type labels as in-context examples. Across three benchmarks, SAGE achieves the best average performance among strong ML/DL and language-model-based baselines. Ablation studies and human evaluation further show that the proposed framework improves detection reliability and the practical usefulness of diagnostic outputs.

Problem

Research questions and friction points this paper is trying to address.

time series anomaly detection

large language models

interpretability

reliability

complex anomaly patterns

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent framework

specialized analyzers

time series anomaly detection