TS-MLLM: A Multi-Modal Large Language Model-based Framework for Industrial Time-Series Big Data Analysis

📅 2026-03-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing industrial time-series analysis methods, which are predominantly confined to unimodal modeling and struggle to effectively integrate time-domain signals, frequency-domain images, and textual knowledge—thereby constraining predictive performance in equipment health monitoring. To overcome this, we propose the first tri-modal large language model framework tailored for industrial time-series data. Our approach leverages temporal chunking, a spectrum-aware vision-language alignment mechanism (SVLMA), and a temporal-centric multimodal attention fusion (TMAF) module to achieve deep alignment and collaborative modeling across time-series, spectral images, and text. Evaluated on multiple industrial benchmarks, the proposed method significantly outperforms state-of-the-art techniques, demonstrating superior robustness, generalization, and computational efficiency—particularly under few-shot and complex operational scenarios.

Technology Category

Application Category

📝 Abstract
Accurate analysis of industrial time-series big data is critical for the Prognostics and Health Management (PHM) of industrial equipment. While recent advancements in Large Language Models (LLMs) have shown promise in time-series analysis, existing methods typically focus on single-modality adaptations, failing to exploit the complementary nature of temporal signals, frequency-domain visual representations, and textual knowledge information. In this paper, we propose TS-MLLM, a unified multi-modal large language model framework designed to jointly model temporal signals, frequency-domain images, and textual domain knowledge. Specifically, we first develop an Industrial time-series Patch Modeling branch to capture long-range temporal dynamics. To integrate cross-modal priors, we introduce a Spectrum-aware Vision-Language Model Adaptation (SVLMA) mechanism that enables the model to internalize frequency-domain patterns and semantic context. Furthermore, a Temporal-centric Multi-modal Attention Fusion (TMAF) mechanism is designed to actively retrieve relevant visual and textual cues using temporal features as queries, ensuring deep cross-modal alignment. Extensive experiments on multiple industrial benchmarks demonstrate that TS-MLLM significantly outperforms state-of-the-art methods, particularly in few-shot and complex scenarios. The results validate our framework's superior robustness, efficiency, and generalization capabilities for industrial time-series prediction.
Problem

Research questions and friction points this paper is trying to address.

Industrial Time-Series
Multi-Modal Learning
Large Language Models
Prognostics and Health Management
Cross-Modal Fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal Large Language Model
Time-Series Analysis
Spectrum-aware Vision-Language Adaptation
Temporal-centric Attention Fusion
Industrial Prognostics and Health Management
🔎 Similar Papers
No similar papers found.
H
Haiteng Wang
School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China
Y
Yikang Li
School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China
Y
Yunfei Zhu
School of Software, Beihang University, Beijing 100191, China
J
Jingheng Yan
School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China
Lei Ren
Lei Ren
Li Auto
NLP、LLM、VLM
L
Laurence T. Yang
School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; Department of Computer Science, St. Francis Xavier University, Antigonish, NS B2G 2W5, Canada