UniDiff: A Unified Diffusion Framework for Multimodal Time Series Forecasting

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

188K/year
🤖 AI Summary
Existing diffusion models are predominantly limited to unimodal numerical time series modeling, struggling to effectively integrate heterogeneous modalities—such as textual descriptions and temporal timestamps—to enhance multimodal time series forecasting (TSF). To address this, we propose UniDiff, a unified diffusion framework. First, it employs lightweight MLP-based patch embeddings to preserve local temporal dynamics. Second, it introduces a parallel cross-attention fusion module enabling adaptive interaction between textual semantics and temporal structural information. Third, it incorporates a single-step cross-modal fusion mechanism alongside a classifier-free guidance strategy conditioned on multiple sources, thereby decoupling textual and temporal controls to improve flexibility and robustness. Extensive experiments across eight real-world benchmark datasets demonstrate that UniDiff significantly outperforms state-of-the-art methods, establishing new performance records in multimodal TSF.

Technology Category

Application Category

📝 Abstract
As multimodal data proliferates across diverse real-world applications, leveraging heterogeneous information such as texts and timestamps for accurate time series forecasting (TSF) has become a critical challenge. While diffusion models demonstrate exceptional performance in generation tasks, their application to TSF remains largely confined to modeling single-modality numerical sequences, overlooking the abundant cross-modal signals inherent in complex heterogeneous data. To address this gap, we propose UniDiff, a unified diffusion framework for multimodal time series forecasting. To process the numerical sequence, our framework first tokenizes the time series into patches, preserving local temporal dynamics by mapping each patch to an embedding space via a lightweight MLP. At its core lies a unified and parallel fusion module, where a single cross-attention mechanism adaptively weighs and integrates structural information from timestamps and semantic context from texts in one step, enabling a flexible and efficient interplay between modalities. Furthermore, we introduce a novel classifier-free guidance mechanism designed for multi-source conditioning, allowing for decoupled control over the guidance strength of textual and temporal information during inference, which significantly enhances model robustness. Extensive experiments on real-world benchmark datasets across eight domains demonstrate that the proposed UniDiff model achieves state-of-the-art performance.
Problem

Research questions and friction points this paper is trying to address.

UniDiff addresses multimodal time series forecasting with heterogeneous data like texts and timestamps
It integrates cross-modal signals via a unified fusion module with cross-attention
It enhances robustness with a classifier-free guidance mechanism for multi-source conditioning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified diffusion framework for multimodal time series forecasting
Cross-attention mechanism integrates timestamps and texts in parallel
Classifier-free guidance enables decoupled control over multimodal conditioning
🔎 Similar Papers
D
Da Zhang
School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi’an 710072, China and also with the Institute of Artificial Intelligence (TeleAI), China Telecom, China
B
Bingyu Li
Institute of Artificial Intelligence (TeleAI), China Telecom, China
Z
Zhuyuan Zhao
Institute of Artificial Intelligence (TeleAI), China Telecom, China
J
Junyu Gao
School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi’an 710072, China and also with the Institute of Artificial Intelligence (TeleAI), China Telecom, China
Feiping Nie
Feiping Nie
OPTIMAL
Machine LearningPattern RecognitionComputer VisionData MiningArtificial Intelligence
X
Xuelong Li
Institute of Artificial Intelligence (TeleAI), China Telecom, China