🤖 AI Summary
Diffusion models underperform regression-based methods in time-series point forecasting due to insufficient contextual modeling and inherent trade-offs between stability and accuracy. This paper proposes SimDiff—a single-stage, end-to-end diffusion framework that unifies denoising and point prediction within a single Transformer, eliminating reliance on external pretraining. Key innovations include: (i) intrinsic output diversity coupled with multi-sample ensembling; (ii) normalization-agnostic architectural design; and (iii) a median-mean estimator enhancing robustness to distributional shift and prediction stability. By jointly optimizing diffusion modeling and mean squared error (MSE), SimDiff achieves state-of-the-art point forecasting accuracy across multiple benchmarks—outperforming both existing diffusion and regression approaches. Results demonstrate that a streamlined architecture can deliver superior efficiency and generalization without sacrificing predictive fidelity.
📝 Abstract
Diffusion models have recently shown promise in time series forecasting, particularly for probabilistic predictions. However, they often fail to achieve state-of-the-art point estimation performance compared to regression-based methods. This limitation stems from difficulties in providing sufficient contextual bias to track distribution shifts and in balancing output diversity with the stability and precision required for point forecasts. Existing diffusion-based approaches mainly focus on full-distribution modeling under probabilistic frameworks, often with likelihood maximization objectives, while paying little attention to dedicated strategies for high-accuracy point estimation. Moreover, other existing point prediction diffusion methods frequently rely on pre-trained or jointly trained mature models for contextual bias, sacrificing the generative flexibility of diffusion models.
To address these challenges, we propose SimDiff, a single-stage, end-to-end framework. SimDiff employs a single unified Transformer network carefully tailored to serve as both denoiser and predictor, eliminating the need for external pre-trained or jointly trained regressors. It achieves state-of-the-art point estimation performance by leveraging intrinsic output diversity and improving mean squared error accuracy through multiple inference ensembling. Key innovations, including normalization independence and the median-of-means estimator, further enhance adaptability and stability. Extensive experiments demonstrate that SimDiff significantly outperforms existing methods in time series point forecasting.