Impute-MACFM: Imputation based on Mask-Aware Flow Matching

📅 2025-09-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing imputation methods for tabular data—particularly longitudinal healthcare data—suffer from overly restrictive assumptions about missingness mechanisms, limited modeling capacity, and unstable inference. To address these challenges, we propose Mask-Aware Flow Matching (MAFM), a novel framework that explicitly encodes missingness patterns as conditional signals. MAFM introduces mask-aware trajectory learning and time-decaying noise injection, imposing distinct dynamic evolution constraints on observed versus missing entries. It further incorporates stability regularization and consistency constraints, coupled with nonlinear noise scheduling, constraint-preserving ODE solvers, and multi-path ensemble inference. Evaluated on multiple benchmark datasets, MAFM achieves state-of-the-art performance, delivering superior imputation accuracy, enhanced robustness to diverse missingness mechanisms, and improved inference efficiency compared to existing approaches.

Technology Category

Application Category

📝 Abstract
Tabular data are central to many applications, especially longitudinal data in healthcare, where missing values are common, undermining model fidelity and reliability. Prior imputation methods either impose restrictive assumptions or struggle with complex cross-feature structure, while recent generative approaches suffer from instability and costly inference. We propose Impute-MACFM, a mask-aware conditional flow matching framework for tabular imputation that addresses missingness mechanisms, missing completely at random, missing at random, and missing not at random. Its mask-aware objective builds trajectories only on missing entries while constraining predicted velocity to remain near zero on observed entries, using flexible nonlinear schedules. Impute-MACFM combines: (i) stability penalties on observed positions, (ii) consistency regularization enforcing local invariance, and (iii) time-decayed noise injection for numeric features. Inference uses constraint-preserving ordinary differential equation integration with per-step projection to fix observed values, optionally aggregating multiple trajectories for robustness. Across diverse benchmarks, Impute-MACFM achieves state-of-the-art results while delivering more robust, efficient, and higher-quality imputation than competing approaches, establishing flow matching as a promising direction for tabular missing-data problems, including longitudinal data.
Problem

Research questions and friction points this paper is trying to address.

Addresses missing value imputation in tabular data with complex cross-feature dependencies
Handles multiple missingness mechanisms including MCAR, MAR and MNAR scenarios
Overcomes instability and costly inference limitations of prior generative approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mask-aware conditional flow matching for tabular imputation
Constraint-preserving ODE integration with projection
Stability penalties and consistency regularization for robustness
🔎 Similar Papers
No similar papers found.
D
Dengyi Liu
Dept. of Graduate Computer Science and Engineering, Yeshiva University, New York, USA
H
Honggang Wang
Dept. of Graduate Computer Science and Engineering, Yeshiva University, New York, USA
Hua Fang
Hua Fang
University of Massachusetts Dartmouth & Medical School
Statistical/Machine LearningComputational Statistics(Behavioral) Trajectory Pattern RecognitionMissing Data Analyses