đ¤ AI Summary
This work addresses the challenge of imputing biologically implausible missing values (e.g., zero-value artifacts in the UCI Diabetes dataset) in clinical data. We propose a gradient-free optimization framework that integrates Principal Component Analysis (PCA) with quantum-inspired state rotation. Crucially, state rotations are constrained within Âą2 standard deviationsâthereby avoiding overreliance on mean or median imputation and enabling statistically faithful reconstruction. Multiple optimizersâincluding COBYLA, simulated annealing, and differential evolutionâare jointly employed to minimize distributional divergence, quantified via Wasserstein distance and KolmogorovâSmirnov (KS) test statistics. Experiments demonstrate substantial improvements: average Wasserstein distance decreases by over 85%; KS p-values stabilize between 0.18â0.22âsignificantly lower than those (>0.99) achieved by conventional methodsâindicating superior distributional fidelity. The approach markedly enhances clinical plausibility and variability modeling of imputed data.
đ Abstract
Data imputation is a critical step in data pre-processing, particularly for datasets with missing or unreliable values. This study introduces a novel quantum-inspired imputation framework evaluated on the UCI Diabetes dataset, which contains biologically implausible missing values across several clinical features. The method integrates Principal Component Analysis (PCA) with quantum-assisted rotations, optimized through gradient-free classical optimizers -COBYLA, Simulated Annealing, and Differential Evolution to reconstruct missing values while preserving statistical fidelity. Reconstructed values are constrained within +/-2 standard deviations of original feature distributions, avoiding unrealistic clustering around central tendencies. This approach achieves a substantial and statistically significant improvement, including an average reduction of over 85% in Wasserstein distance and Kolmogorov-Smirnov test p-values between 0.18 and 0.22, compared to p-values>0.99 in classical methods such as Mean, KNN, and MICE. The method also eliminates zero-value artifacts and enhances the realism and variability of imputed data. By combining quantum-inspired transformations with a scalable classical framework, this methodology provides a robust solution for imputation tasks in domains such as healthcare and AI pipelines, where data quality and integrity are crucial.