🤖 AI Summary
Existing multimodal large language models struggle to perform fine-grained reasoning over the multidimensional structures inherent in time series data, limiting their ability to accurately classify, localize, and explain anomalies. To address this, this work proposes TimerPO, a method that integrates an expert-designed chain-of-thought grounded in classical time series analysis with an optimal transport–driven temporal grounding advantage function and an orthogonal projection mechanism. This framework injects fine-grained signals without interfering with the primary task, yielding verifiable reasoning trajectories. Evaluated across diverse anomaly scenarios, TimerPO surpasses larger commercial models such as GPT-4o in both classification and localization accuracy, demonstrating particularly strong performance on point anomalies and frequency-domain anomalies, thereby establishing a unified and interpretable anomaly detection framework.
📝 Abstract
Time-series anomaly detection (TSAD) with multimodal large language models (MLLMs) is an emerging area, yet a persistent challenge remains: MLLMs rely on coarse time-series heuristics but struggle with multi-dimensional, detailed reasoning, which is vital for understanding complex time-series data. We present AnomSeer to address this by reinforcing the model to ground its reasoning in precise, structural details of time series, unifying anomaly classification, localization, and explanation. At its core, an expert chain-of-thought trace is generated to provide a verifiable, fine-grained reasoning from classical analyses (e.g., statistical measures, frequency transforms). Building on this, we propose a novel time-series grounded policy optimization (TimerPO) that incorporates two additional components beyond standard reinforcement learning: a time-series grounded advantage based on optimal transport and an orthogonal projection to ensure this auxiliary granular signal does not interfere with the primary detection objective. Across diverse anomaly scenarios, AnomSeer, with Qwen2.5-VL-3B/7B-Instruct, outperforms larger commercial baselines (e.g., GPT-4o) in classification and localization accuracy, particularly on point- and frequency-driven exceptions. Moreover, it produces plausible time-series reasoning traces that support its conclusions.