VETime: Vision Enhanced Zero-Shot Time Series Anomaly Detection

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge in time series anomaly detection where point anomalies and contextual anomalies are difficult to capture simultaneously, and fine-grained localization often conflicts with global context awareness. To overcome this trade-off, the authors propose a zero-shot anomaly detection method that synergistically integrates the strengths of one-dimensional temporal modeling and two-dimensional visual representation. By leveraging invertible image transformation and block-level temporal alignment, a shared visual-temporal timeline is constructed. The approach further incorporates anomaly window contrastive learning and a task-adaptive multimodal fusion mechanism to achieve precise anomaly localization. Evaluated under a zero-shot setting, the proposed method significantly outperforms current state-of-the-art models while maintaining lower computational overhead than existing vision-based approaches.

Technology Category

Application Category

📝 Abstract
Time-series anomaly detection (TSAD) requires identifying both immediate Point Anomalies and long-range Context Anomalies. However, existing foundation models face a fundamental trade-off: 1D temporal models provide fine-grained pointwise localization but lack a global contextual perspective, while 2D vision-based models capture global patterns but suffer from information bottlenecks due to a lack of temporal alignment and coarse-grained pointwise detection. To resolve this dilemma, we propose VETime, the first TSAD framework that unifies temporal and visual modalities through fine-grained visual-temporal alignment and dynamic fusion. VETime introduces a Reversible Image Conversion and a Patch-Level Temporal Alignment module to establish a shared visual-temporal timeline, preserving discriminative details while maintaining temporal sensitivity. Furthermore, we design an Anomaly Window Contrastive Learning mechanism and a Task-Adaptive Multi-Modal Fusion to adaptively integrate the complementary perceptual strengths of both modalities. Extensive experiments demonstrate that VETime significantly outperforms state-of-the-art models in zero-shot scenarios, achieving superior localization precision with lower computational overhead than current vision-based approaches. Code available at: https://github.com/yyyangcoder/VETime.
Problem

Research questions and friction points this paper is trying to address.

Time-series anomaly detection
Point Anomalies
Context Anomalies
Temporal alignment
Zero-shot learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-Shot Time Series Anomaly Detection
Visual-Temporal Alignment
Multi-Modal Fusion
Patch-Level Temporal Alignment
Anomaly Window Contrastive Learning
🔎 Similar Papers
No similar papers found.
Y
Yingyuan Yang
Department of Industrial Engineering, Tsinghua University, Beijing, China
Tian Lan
Tian Lan
Tsinghua University
Causal inference
Y
Yifei Gao
Department of Industrial Engineering, Tsinghua University, Beijing, China
Y
Yimeng Lu
Department of Industrial Engineering, Tsinghua University, Beijing, China
W
Wenjun He
22012 Lab, Huawei Technologies Ltd, Beijing, China
M
Meng Wang
22012 Lab, Huawei Technologies Ltd, Beijing, China
C
Chenghao Liu
Datadog AI Research
Chen Zhang
Chen Zhang
Tsinghua University
industrial statisticsmachine learning