🤖 AI Summary
This study identifies a systematic bias introduced by dimensionality reduction in quantum machine learning (QML) evaluation. To mitigate resource constraints on NISQ devices and classical simulation bottlenecks, existing works routinely apply dimensionality reduction as a preprocessing step—yet its confounding effect on performance assessment remains underrecognized. Through large-scale comparative experiments—spanning synthetic and real-world datasets, classical methods (e.g., PCA), quantum encoding schemes (amplitude and angle encoding), diverse ansatz architectures, and mainstream quantum classifiers—we observe accuracy and F1-score fluctuations of 14–48% attributable to reduction. Critically, the bias magnitude arises from the coupled influence of data characteristics, encoding strategy, and circuit structure. We establish, for the first time, that dimensionality reduction is not a neutral preprocessing step but a critical confounder inducing erroneous model efficacy judgments; moreover, implicit compatibility among specific reduction–encoding–circuit combinations further distorts performance attribution. Our findings provide methodological warnings and practical benchmarks for fair QML evaluation.
📝 Abstract
Data dimensionality reduction techniques are often utilized in the implementation of Quantum Machine Learning models to address two significant issues: the constraints of NISQ quantum devices, which are characterized by noise and a limited number of qubits, and the challenge of simulating a large number of qubits on classical devices. It also raises concerns over the scalability of these approaches, as dimensionality reduction methods are slow to adapt to large datasets. In this article, we analyze how data reduction methods affect different QML models. We conduct this experiment over several generated datasets, quantum machine algorithms, quantum data encoding methods, and data reduction methods. All these models were evaluated on the performance metrics like accuracy, precision, recall, and F1 score. Our findings have led us to conclude that the usage of data dimensionality reduction methods results in skewed performance metric values, which results in wrongly estimating the actual performance of quantum machine learning models. There are several factors, along with data dimensionality reduction methods, that worsen this problem, such as characteristics of the datasets, classical to quantum information embedding methods, percentage of feature reduction, classical components associated with quantum models, and structure of quantum machine learning models. We consistently observed the difference in the accuracy range of 14% to 48% amongst these models, using data reduction and not using it. Apart from this, our observations have shown that some data reduction methods tend to perform better for some specific data embedding methodologies and ansatz constructions.