🤖 AI Summary
This work addresses the challenges of distributional shift, bias, and overfitting in multivariate time series imputation caused by non-stationarity and systematic missingness. To this end, we propose the Distributionally Robust Imputation Objective (DRIO), which introduces distributionally robust optimization into time series imputation for the first time. DRIO models the uncertainty between the observed and true data distributions using a Wasserstein ambiguity set and jointly minimizes the reconstruction error and the worst-case distributional divergence. Leveraging duality theory, the infinite-dimensional measure optimization is transformed into an adversarial search over observed sample trajectories, enabling the design of an adversarial learning algorithm compatible with deep neural networks. Extensive experiments on real-world datasets demonstrate that DRIO consistently achieves significant performance gains under both missing-at-random and non-random missing scenarios, attaining a Pareto-optimal trade-off between reconstruction accuracy and distributional consistency.
📝 Abstract
Multivariate time series (MTS) imputation is often compromised by mismatch between observed and true data distributions -- a bias exacerbated by non-stationarity and systematic missingness. Standard methods that minimize reconstruction error or encourage distributional alignment risk overfitting these biased observations. We propose the Distributionally Robust Regularized Imputer Objective (DRIO), which jointly minimizes reconstruction error and the divergence between the imputer and a worst-case distribution within a Wasserstein ambiguity set. We derive a tractable dual formulation that reduces infinite-dimensional optimization over measures to adversarial search over sample trajectories, and propose an adversarial learning algorithm compatible with flexible deep learning backbones. Comprehensive experiments on diverse real-world datasets show DRIO consistently improves imputation under both missing-completely-at-random and missing-not-at-random settings, reaching Pareto-optimal trade-offs between reconstruction accuracy and distributional alignment.