Label-Guided Imputation via Forest-Based Proximities for Improved Time Series Classification

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitation of conventional time-series imputation methods—which ignore class label information in time-series classification tasks—this paper proposes a label-guided supervised imputation framework. Our approach constructs a label-constrained proximity matrix using tree-based models (e.g., random forests or gradient-boosted trees), thereby embedding class semantics into the similarity metric and enabling conditional, weighted imputation. Unlike unsupervised imputation, our method explicitly models the joint distribution of labels and time-series observations, yielding discriminative imputations that better preserve class-separability. Experiments demonstrate that, despite non-zero reconstruction error in imputed values, downstream classification accuracy is significantly improved—validating the critical role of label guidance in time-series imputation. This work establishes a novel paradigm for supervised time-series imputation.

Technology Category

Application Category

📝 Abstract
Missing data is a common problem in time series data. Most methods for imputation ignore label information pertaining to the time series even if that information exists. In this paper, we provide a framework for missing data imputation in the context of time series classification, where each time series is associated with a categorical label. We define a means of imputing missing values conditional upon labels, the method being guided by powerful, existing supervised models designed for high accuracy in this task. From each model, we extract a tree-based proximity measure from which imputation can be applied. We show that imputation using this method generally provides richer information leading to higher classification accuracies, despite the imputed values differing from the true values.
Problem

Research questions and friction points this paper is trying to address.

Imputing missing values in time series classification tasks
Leveraging label information through forest-based proximity measures
Improving classification accuracy despite imperfect imputed values
Innovation

Methods, ideas, or system contributions that make the work stand out.

Label-guided imputation using forest-based proximities
Tree-based proximity measures extracted from supervised models
Conditional imputation on labels for time series classification
🔎 Similar Papers
No similar papers found.
Jake S. Rhodes
Jake S. Rhodes
Brigham Young University
Machine LearningData Science
A
Adam G. Rustad
Department of Computer Science, Brigham Young University, Provo, UT, USA
S
Sofia Pelagalli Maia
Department of Statistics, Brigham Young University, Provo, UT, USA
E
Evan Thacker
Department of Public Health, Brigham Young University, Provo, UT, USA
H
Hyunmi Choi
Department of Neurology, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY , USA
J
Jose Gutierrez
Department of Neurology, Miller School of Medicine, University of Miami, Miami, FL, USA
T
Tatjana Rundek
Department of Neurology, Miller School of Medicine, University of Miami, Miami, FL, USA
B
Ben Shaw
Department of Mathematics & Statistics, Utah State University, Logan, UT, USA