DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Existing generative time series models struggle to effectively capture complex dependencies among multivariate signals, particularly underperforming in covariate-aware forecasting scenarios. This work proposes DiTS—a multimodal diffusion Transformer architecture for time series modeling that treats endogenous and exogenous variables as distinct modalities. DiTS employs a dual-stream Transformer to separately model temporal autoregression and cross-variable interactions. It introduces a novel multimodal diffusion mechanism featuring dedicated temporal and variable attention modules, and leverages the low-rank structure of multivariate dependencies to reduce computational complexity. Extensive experiments demonstrate that DiTS significantly outperforms state-of-the-art deterministic deep forecasting models across multiple benchmark datasets, achieving leading performance both with and without access to future exogenous information.

Technology Category

Application Category

📝 Abstract

While generative modeling on time series facilitates more capable and flexible probabilistic forecasting, existing generative time series models do not address the multi-dimensional properties of time series data well. The prevalent architecture of Diffusion Transformers (DiT), which relies on simplistic conditioning controls and a single-stream Transformer backbone, tends to underutilize cross-variate dependencies in covariate-aware forecasting. Inspired by Multimodal Diffusion Transformers that integrate textual guidance into video generation, we propose Diffusion Transformers for Time Series (DiTS), a general-purpose architecture that frames endogenous and exogenous variates as distinct modalities. To better capture both inter-variate and intra-variate dependencies, we design a dual-stream Transformer block tailored for time-series data, comprising a Time Attention module for autoregressive modeling along the temporal dimension and a Variate Attention module for cross-variate modeling. Unlike the common approach for images, which flattens 2D token grids into 1D sequences, our design leverages the low-rank property inherent in multivariate dependencies, thereby reducing computational costs. Experiments show that DiTS achieves state-of-the-art performance across benchmarks, regardless of the presence of future exogenous variate observations, demonstrating unique generative forecasting strengths over traditional deterministic deep forecasting models.

Problem

Research questions and friction points this paper is trying to address.

time series forecasting

multivariate dependencies

generative modeling

cross-variate modeling

diffusion models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Transformers

Multimodal Time Series

Dual-stream Transformer