π€ AI Summary
Existing generative time series models struggle to effectively capture complex dependencies among multivariate signals, particularly underperforming in covariate-aware forecasting scenarios. This work proposes DiTSβa multimodal diffusion Transformer architecture for time series modeling that treats endogenous and exogenous variables as distinct modalities. DiTS employs a dual-stream Transformer to separately model temporal autoregression and cross-variable interactions. It introduces a novel multimodal diffusion mechanism featuring dedicated temporal and variable attention modules, and leverages the low-rank structure of multivariate dependencies to reduce computational complexity. Extensive experiments demonstrate that DiTS significantly outperforms state-of-the-art deterministic deep forecasting models across multiple benchmark datasets, achieving leading performance both with and without access to future exogenous information.
π Abstract
While generative modeling on time series facilitates more capable and flexible probabilistic forecasting, existing generative time series models do not address the multi-dimensional properties of time series data well. The prevalent architecture of Diffusion Transformers (DiT), which relies on simplistic conditioning controls and a single-stream Transformer backbone, tends to underutilize cross-variate dependencies in covariate-aware forecasting. Inspired by Multimodal Diffusion Transformers that integrate textual guidance into video generation, we propose Diffusion Transformers for Time Series (DiTS), a general-purpose architecture that frames endogenous and exogenous variates as distinct modalities. To better capture both inter-variate and intra-variate dependencies, we design a dual-stream Transformer block tailored for time-series data, comprising a Time Attention module for autoregressive modeling along the temporal dimension and a Variate Attention module for cross-variate modeling. Unlike the common approach for images, which flattens 2D token grids into 1D sequences, our design leverages the low-rank property inherent in multivariate dependencies, thereby reducing computational costs. Experiments show that DiTS achieves state-of-the-art performance across benchmarks, regardless of the presence of future exogenous variate observations, demonstrating unique generative forecasting strengths over traditional deterministic deep forecasting models.