Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback

πŸ“… 2025-07-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing time-series anomaly detection methods primarily perform binary classification, lacking fine-grained anomaly categorization and interpretable, reasoning-based explanations. Method: This paper introduces Time-RA, a novel generative reasoning task for anomaly detection, and presents RATs40Kβ€”the first multimodal real-world benchmark (40K samples across 10 domains) integrating time-series data, textual context, and visual charts, annotated with fine-grained anomaly types and structured reasoning descriptions. We propose a robust annotation framework combining ensemble label generation with GPT-4 feedback refinement, and conduct supervised fine-tuning and inference evaluation using large language models (LLMs) and multimodal LMs (MLLMs). Contribution/Results: Comprehensive evaluation demonstrates that supervised fine-tuning substantially improves reasoning performance, while uncovering critical bottlenecks in causal attribution and cross-modal alignment. RATs40K establishes a new benchmark for interpretable, reasoning-driven anomaly detection and opens avenues for future research in explainable time-series AI.

Technology Category

Application Category

πŸ“ Abstract
Time series anomaly detection is critical across various domains, yet current approaches often limit analysis to mere binary anomaly classification without detailed categorization or further explanatory reasoning. To address these limitations, we propose a novel task, Time-series Reasoning for Anomaly (Time-RA) that transforms classical time series anomaly detection from a discriminative into a generative, reasoning-intensive task leveraging Large Language Models (LLMs). Also, we introduce the first real-world multimodal benchmark dataset, RATs40K, explicitly annotated for anomaly reasoning, comprising approximately 40,000 samples across 10 real-world domains. Each sample includes numeric time series data, contextual text information, and visual representations, each annotated with fine-grained categories (14 types for univariate anomalies and 6 for multivariate anomalies) and structured explanatory reasoning. We develop a sophisticated annotation framework utilizing ensemble-generated labels refined through GPT-4-driven feedback, ensuring accuracy and interpretability. Extensive benchmarking of LLMs and multimodal LLMs demonstrates the capabilities and limitations of current models, highlighting the critical role of supervised fine-tuning. Our dataset and task pave the way for significant advancements in interpretable time series anomaly detection and reasoning.
Problem

Research questions and friction points this paper is trying to address.

Enhances time series anomaly detection with detailed categorization and reasoning
Introduces multimodal dataset for anomaly reasoning with 40K annotated samples
Evaluates LLMs' capabilities and limitations in interpretable anomaly detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LLMs for generative anomaly reasoning
Introducing multimodal benchmark dataset RATs40K
Using GPT-4 feedback for accurate annotations
πŸ”Ž Similar Papers
No similar papers found.