Masking criteria for selecting an imputation model

📅 2025-11-13

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

Masked Leave-One-Out (MOO) evaluation focuses solely on prediction accuracy, neglecting the ability to model imputation randomness. Method: We propose three novel MOO criteria—based on rank transformation, energy distance, and likelihood principles—to jointly quantify distributional fidelity and predictive utility of imputations. We further develop a two-dimensional visualization framework that exposes the theoretical connection between MOO and the Missing-At-Random (MAR) assumption, and establish a model selection theory for imputation with statistical consistency guarantees. Contributions: Integrating semiparametric efficiency theory, the Bayesian Information Criterion (BIC), and statistical learning techniques, our approach ensures both asymptotic consistency and computational feasibility. The visualization tool enables intuitive, multi-model performance comparison. Overall, we introduce an interpretable, verifiable paradigm for principled imputation model selection.

Technology Category

Application Category

📝 Abstract

The masking-one-out (MOO) procedure, masking an observed entry and comparing it versus its imputed values, is a very common procedure for comparing imputation models. We study the optimum of this procedure and generalize it to a missing data assumption and establish the corresponding semi-parametric efficiency theory. However, MOO is a measure of prediction accuracy, which is not ideal for evaluating an imputation model. To address this issue, we introduce three modified MOO criteria, based on rank transformation, energy distance, and likelihood principle, that allow us to select an imputation model that properly account for the stochastic nature of data. The likelihood approach further enables an elegant framework of learning an imputation model from the data and we derive its statistical and computational learning theories as well as consistency of BIC model selection. We also show how MOO is related to the missing-at-random assumption. Finally, we introduce the prediction-imputation diagram, a two-dimensional diagram visually comparing both the prediction and imputation utilities for various imputation models.

Problem

Research questions and friction points this paper is trying to address.

Evaluating imputation models using prediction accuracy is not ideal

Introducing modified criteria to properly account for data stochasticity

Establishing statistical theory for learning imputation models from data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modified MOO criteria with rank transformation

Energy distance and likelihood principle enhancements

Prediction-imputation diagram for visual model comparison

🔎 Similar Papers

Beyond Random Missingness: Clinically Rethinking for Healthcare Time Series Imputation