🤖 AI Summary
Quantum programs exhibit inherent flakiness—unreproducible test outcomes—due to quantum-specific features such as superposition and entanglement; however, existing research lacks systematic causal analysis and effective detection methods. Method: This paper presents the first systematic investigation into the root causes of quantum flakiness and introduces the first machine learning–based identification platform for quantum software testing. We propose a supervised learning framework integrating both static and dynamic program features, and comparatively evaluate five models—including XGBoost and decision trees—under balanced and imbalanced data settings. Contribution/Results: Our evaluation shows XGBoost achieves optimal F1 and Matthews Correlation Coefficient (MCC) on balanced data, while decision trees excel on imbalanced data. We extend and publicly release the first fully annotated quantum flaky test dataset. Experimental results demonstrate significant improvements in flakiness identification performance, establishing foundational theory, practical tooling, and benchmark resources for quantum software reliability research.
📝 Abstract
Testing and debugging quantum software pose significant challenges due to the inherent complexities of quantum mechanics, such as superposition and entanglement. One challenge is indeterminacy, a fundamental characteristic of quantum systems, which increases the likelihood of flaky tests in quantum programs. To the best of our knowledge, there is a lack of comprehensive studies on quantum flakiness in the existing literature. In this paper, we present a novel machine learning platform that leverages multiple machine learning models to automatically detect flaky tests in quantum programs. Our evaluation shows that the extreme gradient boosting and decision tree-based models outperform other models (i.e., random forest, k-nearest neighbors, and support vector machine), achieving the highest F1 score and Matthews Correlation Coefficient in a balanced dataset and an imbalanced dataset, respectively. Furthermore, we expand the currently limited dataset for researchers interested in quantum flaky tests. In the future, we plan to explore the development of unsupervised learning techniques to detect and classify quantum flaky tests more effectively. These advancements aim to improve the reliability and robustness of quantum software testing.