Still More Shades of Null: An Evaluation Suite for Responsible Missing Value Imputation

📅 2024-09-11

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Missing data mechanisms are complex and often shift between training and testing phases, yet existing imputation methods lack holistic evaluation across imputation quality, fairness, and downstream model performance and stability. Method: We propose Shades-of-NULL—a comprehensive evaluation framework that (i) formally models coexisting missingness mechanisms and mechanism shift across train–test splits; (ii) establishes a diverse benchmark spanning synthetic and real-world missing-data scenarios; (iii) integrates 29 imputation methods with mainstream predictive models; and (iv) introduces a five-dimensional metric suite quantifying accuracy, group fairness, predictive performance, fairness preservation, and stability. Contribution/Results: Through large-scale empirical analysis of 29,736 experimental pipelines, we uncover systematic trade-offs among these objectives. We open-source the Shades-of-NULL toolkit to advance responsible, reproducible research on missing-value handling.

Technology Category

Application Category

📝 Abstract

Data missingness is a practical challenge of sustained interest to the scientific community. In this paper, we present Shades-of-NULL, an evaluation suite for responsible missing value imputation. Our work is novel in two ways (i) we model realistic and socially-salient missingness scenarios that go beyond Rubin's classic Missing Completely at Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR) settings, to include multi-mechanism missingness (when different missingness patterns co-exist in the data) and missingness shift (when the missingness mechanism changes between training and test) (ii) we evaluate imputers holistically, based on imputation quality and imputation fairness, as well as on the predictive performance, fairness and stability of the models that are trained and tested on the data post-imputation. We use Shades-of-NULL to conduct a large-scale empirical study involving 29,736 experimental pipelines, and find that while there is no single best-performing imputation approach for all missingness types, interesting trade-offs arise between predictive performance, fairness and stability, based on the combination of missingness scenario, imputer choice, and the architecture of the predictive model. We make Shades-of-NULL publicly available, to enable researchers to rigorously evaluate missing value imputation methods on a wide range of metrics in plausible and socially meaningful scenarios.

Problem

Research questions and friction points this paper is trying to address.

Evaluate missing value imputation methods

Model realistic missingness scenarios

Assess imputation quality and fairness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model multi-mechanism missingness scenarios

Evaluate imputers on quality and fairness

Conduct large-scale empirical study pipelines

🔎 Similar Papers

Deep Learning for Multivariate Time Series Imputation: A Survey