🤖 AI Summary
To address the challenges of generality, modeling capacity, and scalability in missing value imputation for tabular and time-series data, this paper proposes Conditional Flow Matching Imputation (CFMI). CFMI is the first method to integrate conditional flow matching with a shared conditional encoder, yielding a differentiable, efficient, and high-dimensional scalable generative imputation framework that supports zero-shot time-series imputation. Technically, it unifies continuous normalizing flows and flow matching under a single conditional modeling paradigm, enabling joint optimization across data modalities. On 24 medium- and small-scale tabular benchmarks, CFMI matches or surpasses nine state-of-the-art baselines. For zero-shot time-series imputation, it achieves accuracy comparable to diffusion models while accelerating inference by several-fold. In high-dimensional settings, CFMI maintains robust performance, effectively alleviating the limitations of traditional multiple imputation—namely, inadequate modeling of complex dependencies and poor scalability.
📝 Abstract
We introduce conditional flow matching for imputation (CFMI), a new general-purpose method to impute missing data. The method combines continuous normalising flows, flow-matching, and shared conditional modelling to deal with intractabilities of traditional multiple imputation. Our comparison with nine classical and state-of-the-art imputation methods on 24 small to moderate-dimensional tabular data sets shows that CFMI matches or outperforms both traditional and modern techniques across a wide range of metrics. Applying the method to zero-shot imputation of time-series data, we find that it matches the accuracy of a related diffusion-based method while outperforming it in terms of computational efficiency. Overall, CFMI performs at least as well as traditional methods on lower-dimensional data while remaining scalable to high-dimensional settings, matching or exceeding the performance of other deep learning-based approaches, making it a go-to imputation method for a wide range of data types and dimensionalities.