CFMI: Flow Matching for Missing Data Imputation

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

To address the challenges of generality, modeling capacity, and scalability in missing value imputation for tabular and time-series data, this paper proposes Conditional Flow Matching Imputation (CFMI). CFMI is the first method to integrate conditional flow matching with a shared conditional encoder, yielding a differentiable, efficient, and high-dimensional scalable generative imputation framework that supports zero-shot time-series imputation. Technically, it unifies continuous normalizing flows and flow matching under a single conditional modeling paradigm, enabling joint optimization across data modalities. On 24 medium- and small-scale tabular benchmarks, CFMI matches or surpasses nine state-of-the-art baselines. For zero-shot time-series imputation, it achieves accuracy comparable to diffusion models while accelerating inference by several-fold. In high-dimensional settings, CFMI maintains robust performance, effectively alleviating the limitations of traditional multiple imputation—namely, inadequate modeling of complex dependencies and poor scalability.

Technology Category

Application Category

📝 Abstract

We introduce conditional flow matching for imputation (CFMI), a new general-purpose method to impute missing data. The method combines continuous normalising flows, flow-matching, and shared conditional modelling to deal with intractabilities of traditional multiple imputation. Our comparison with nine classical and state-of-the-art imputation methods on 24 small to moderate-dimensional tabular data sets shows that CFMI matches or outperforms both traditional and modern techniques across a wide range of metrics. Applying the method to zero-shot imputation of time-series data, we find that it matches the accuracy of a related diffusion-based method while outperforming it in terms of computational efficiency. Overall, CFMI performs at least as well as traditional methods on lower-dimensional data while remaining scalable to high-dimensional settings, matching or exceeding the performance of other deep learning-based approaches, making it a go-to imputation method for a wide range of data types and dimensionalities.

Problem

Research questions and friction points this paper is trying to address.

Imputing missing data using flow matching

Comparing CFMI with classical and modern imputation methods

Applying CFMI for zero-shot time-series data imputation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines continuous normalising flows and flow-matching

Uses shared conditional modelling for missing data

Outperforms traditional and modern imputation techniques

🔎 Similar Papers

Deep Learning for Multivariate Time Series Imputation: A Survey