๐ค AI Summary
To address the challenge of multivariate time series (MVTS) understanding and reasoning under scarce high-quality labeled data, this paper introduces the first multimodal large language model (MLLM) that treats time series as a native modality. Methodologically: (1) we design a native TS-MLLM architecture enabling end-to-end joint modeling of time series and text; (2) we propose an attribute-driven synthetic data generation framework coupled with Time Series Evol-Instruct, an instruction evolution paradigm tailored for temporal data; (3) the model is fully fine-tuned on synthetic data and refined via cross-modal alignment distillation. Evaluated on six alignment and four complex reasoning tasks, TS-MLLM significantly outperforms vision-based MLLMs (e.g., GPT-4o) and text-only LLMsโachieving +46.0% improvement in alignment accuracy and +25.8% in reasoning performance. This work provides the first empirical validation of synthetic-data-driven TS-MLLMs for low-resource MVTS understanding, demonstrating both effectiveness and strong generalization capability.
๐ Abstract
Understanding time series is crucial for its application in real-world scenarios. Recently, large language models (LLMs) have been increasingly applied to time series tasks, leveraging their strong language capabilities to enhance various applications. However, research on multimodal LLMs (MLLMs) for time series understanding and reasoning remains limited, primarily due to the scarcity of high-quality datasets that align time series with textual information. This paper introduces ChatTS, a novel MLLM designed for time series analysis. ChatTS treats time series as a modality, similar to how vision MLLMs process images, enabling it to perform both understanding and reasoning with time series. To address the scarcity of training data, we propose an attribute-based method for generating synthetic time series with detailed attribute descriptions. We further introduce Time Series Evol-Instruct, a novel approach that generates diverse time series Q&As, enhancing the model's reasoning capabilities. To the best of our knowledge, ChatTS is the first TS-MLLM that takes multivariate time series as input for understanding and reasoning, which is fine-tuned exclusively on synthetic datasets. We evaluate its performance using benchmark datasets with real-world data, including six alignment tasks and four reasoning tasks. Our results show that ChatTS significantly outperforms existing vision-based MLLMs (e.g., GPT-4o) and text/agent-based LLMs, achieving a 46.0% improvement in alignment tasks and a 25.8% improvement in reasoning tasks.