Learning Accurate Models on Incomplete Data with Minimal Imputation

📅 2025-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world data frequently contain missing values, and the conventional “impute-then-model” paradigm incurs substantial computational overhead while risking bias propagation. This paper proposes Minimal Imputation—a novel paradigm that formally defines the minimal imputation problem: identifying the smallest subset of missing entries to impute such that downstream model performance remains optimal. Grounded in statistical learning theory and combinatorial optimization, we develop both exact and efficient approximation algorithms applicable to linear regression, tree-based models, and other common learners. Extensive experiments demonstrate that our approach reduces imputation time and computational resource consumption by over 70% on average, while preserving or even improving predictive accuracy. By decoupling imputation necessity from completeness, Minimal Imputation fundamentally resolves the longstanding trade-off between imputation exhaustiveness and modeling fidelity.

Technology Category

Application Category

📝 Abstract
Missing data often exists in real-world datasets, requiring significant time and effort for imputation to learn accurate machine learning (ML) models. In this paper, we demonstrate that imputing all missing values is not always necessary to achieve an accurate ML model. We introduce the concept of minimal data imputation, which ensures accurate ML models trained over the imputed dataset. Implementing minimal imputation guarantees both minimal imputation effort and optimal ML models. We propose algorithms to find exact and approximate minimal imputation for various ML models. Our extensive experiments indicate that our proposed algorithms significantly reduce the time and effort required for data imputation.
Problem

Research questions and friction points this paper is trying to address.

Reduces time and effort for data imputation
Ensures accurate ML models with minimal imputation
Proposes algorithms for exact and approximate imputation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimal data imputation reduces ML training effort.
Algorithms find exact and approximate minimal imputation.
Ensures accurate ML models with less imputation time.
🔎 Similar Papers
No similar papers found.
C
Cheng Zhen
School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, USA
N
Nischal Aryal
School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, USA
Arash Termehchy
Arash Termehchy
School of EECS, Oregon State University
Data ManagementData Analytics
P
Prayoga
School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, USA
G
Garrett Biwer
School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, USA
S
Sankalp Patil
School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, USA