A PCA-based Data Prediction Method

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address missing value imputation in data science, this paper proposes a geometrically grounded method that integrates classical mathematics with machine learning. First, principal component analysis (PCA) is employed to construct a low-dimensional principal subspace. Next, incomplete samples are mapped to affine subspaces via translation, and the minimum distance between each such affine subspace and the principal subspace spanned by complete samples is computed as a geometric similarity metric. Finally, the nearest complete sample—measured by this distance—is retrieved from a candidate set, and its corresponding feature values are used for imputation. The key contribution lies in the novel introduction of affine subspace distance as a principled geometric foundation for missing value estimation, yielding both rigorous theoretical interpretation and a computationally tractable framework. Extensive experiments demonstrate that the proposed method significantly outperforms state-of-the-art imputation algorithms in preserving the intrinsic structural consistency of the original data.

Technology Category

Application Category

📝 Abstract
The problem of choosing appropriate values for missing data is often encountered in the data science. We describe a novel method containing both traditional mathematics and machine learning elements for prediction (imputation) of missing data. This method is based on the notion of distance between shifted linear subspaces representing the existing data and candidate sets. The existing data set is represented by the subspace spanned by its first principal components. Solutions for the case of the Euclidean metric are given.
Problem

Research questions and friction points this paper is trying to address.

Predicting missing data values using PCA-based method
Combining traditional mathematics with machine learning techniques
Measuring distance between subspaces for data imputation
Innovation

Methods, ideas, or system contributions that make the work stand out.

PCA-based method for missing data imputation
Uses distance between shifted linear subspaces
Combines traditional mathematics with machine learning
🔎 Similar Papers
No similar papers found.