DeRegiME: Deep Regime Mixtures for Probabilistic Forecasting under Distribution Shift

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

201K/year
🤖 AI Summary
This work addresses the challenge that existing neural time series models struggle to effectively capture distributional shifts—such as abrupt changes, gradual drifts, or horizon-dependent effects—in residual uncertainty. The authors propose a deep mechanism mixture-of-experts model that uniquely integrates an interpretable mean–residual–noise decomposition with implicit change-point detection. By employing sparse variational Gaussian processes, the model disentangles the signal from latent uncertainty mechanisms and leverages a non-stationary mixture kernel with a Student-t likelihood to enable robust multi-step probabilistic forecasting. A shared stick-breaking gating mechanism automatically prunes redundant experts, revealing the underlying clustering structure of residuals. Evaluated on ten benchmark datasets, the model achieves an average 20.3% improvement in negative log predictive density (NLPD), along with 3.0% and 4.7% gains in continuous ranked probability score (CRPS) and mean squared error (MSE), respectively, demonstrating superior performance across diverse distributional shift scenarios.
📝 Abstract
We introduce DeRegiME -- Deep Regime Mixture of Experts -- a direct multi-horizon probabilistic forecaster that separates latent uncertainty regimes from the underlying signal and softly assigns each forecast location to learned recurring regimes using a sparse variational Gaussian process (GP) whose nonstationary regime-mixing kernel and Student-t likelihood combine per-regime sub-kernels and noise processes via a shared gate. This yields a single sparse-GP posterior, not a mixture of GP experts. DeRegiME addresses a key limitation of neural forecasters: point forecasts discard residual uncertainty, and probabilistic heads -- whether single marginals, uninterpreted mixtures, quantile sets, or diffusion samples -- rarely expose the regime structure of the residual. Yet distribution shift in noisy heteroskedastic time series may be abrupt, gradual, or horizon-dependent and often appears in residual uncertainty rather than the conditional mean. DeRegiME yields an interpretable mean-residual-noise decomposition with a direct-sum feature-space representation that anchors regimes as clusters of residual similarity whose transitions surface as implicit changepoints. The effective number of regimes is pruned by the stick-breaking gate. We prove kernel validity and predictive-density propriety, and across ten benchmarks and three encoder grids DeRegiME improves negative log predictive density (NLPD) by 20.3% over the strongest encoder-matched baseline, a DeepAR/GluonTS-style dynamic Student-t head, with parallel gains on CRPS (3.0%) and MSE (4.7%). Improvements are consistent across all datasets, which span abrupt, gradual, and seasonal shifts.
Problem

Research questions and friction points this paper is trying to address.

distribution shift
probabilistic forecasting
uncertainty regimes
heteroskedastic time series
residual uncertainty
Innovation

Methods, ideas, or system contributions that make the work stand out.

regime mixture
sparse Gaussian process
distribution shift
probabilistic forecasting
nonstationary kernel