🤖 AI Summary
Probabilistic forecasting of irregularly sampled, multivariate time series with missing values—common in healthcare, astronomy, and climate science—remains challenging. Existing methods are limited to point-wise or univariate predictions and rely on restrictive parametric distributional assumptions, hindering flexible modeling of joint uncertainty across arbitrary timestamps and channels.
Method: We propose the first conditional normalizing flow framework tailored for irregular time series. It introduces a novel invertible triangular attention layer and fully real-valued invertible nonlinear activation functions, eliminating distributional priors and directly modeling the joint conditional distribution of future observations. The architecture integrates temporal encoders, invertible neural networks, and attention mechanisms.
Contribution/Results: Evaluated on four benchmark datasets, our method achieves up to a 4× improvement in log-likelihood over state-of-the-art approaches, demonstrating superior calibration and expressive power in capturing complex, time-varying dependencies.
📝 Abstract
Probabilistic forecasting of irregularly sampled multivariate time series with missing values is an important problem in many fields, including health care, astronomy, and climate. State-of-the-art methods for the task estimate only marginal distributions of observations in single channels and at single timepoints, assuming a fixed-shape parametric distribution. In this work, we propose a novel model, ProFITi, for probabilistic forecasting of irregularly sampled time series with missing values using conditional normalizing flows. The model learns joint distributions over the future values of the time series conditioned on past observations and queried channels and times, without assuming any fixed shape of the underlying distribution. As model components, we introduce a novel invertible triangular attention layer and an invertible non-linear activation function on and onto the whole real line. We conduct extensive experiments on four datasets and demonstrate that the proposed model provides $4$ times higher likelihood over the previously best model.