Black-box optimization and quantum annealing for filtering out mislabeled training instances

📅 2025-01-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses training data purification under label noise. We propose a novel proxy-model-driven collaborative framework—Black-Box Optimization–Post-processing–Quantum Annealing (BBO-Post-QA)—that integrates Gaussian process surrogate modeling, iterative black-box optimization, and quantum annealing for subset selection. To our knowledge, this is the first application of a physical quantum annealer (D-Wave) to training set purification: the surrogate model estimates validation error of candidate subsets; black-box optimization guides search; and the D-Wave clique sampler enables efficient, diverse sampling of high-quality clean subsets. A robust post-processing step further refines selections. Experiments on high-noise binary classification tasks demonstrate substantial improvements in downstream model generalization. Compared to classical simulated annealing (OpenJij/Neal), the D-Wave hardware implementation achieves faster convergence and superior subset quality, validating the efficacy and frontier potential of quantum-inspired optimization for data cleaning.

Technology Category

Application Category

📝 Abstract
This study proposes an approach for removing mislabeled instances from contaminated training datasets by combining surrogate model-based black-box optimization (BBO) with postprocessing and quantum annealing. Mislabeled training instances, a common issue in real-world datasets, often degrade model generalization, necessitating robust and efficient noise-removal strategies. The proposed method evaluates filtered training subsets based on validation loss, iteratively refines loss estimates through surrogate model-based BBO with postprocessing, and leverages quantum annealing to efficiently sample diverse training subsets with low validation error. Experiments on a noisy majority bit task demonstrate the method's ability to prioritize the removal of high-risk mislabeled instances. Integrating D-Wave's clique sampler running on a physical quantum annealer achieves faster optimization and higher-quality training subsets compared to OpenJij's simulated quantum annealing sampler or Neal's simulated annealing sampler, offering a scalable framework for enhancing dataset quality. This work highlights the effectiveness of the proposed method for supervised learning tasks, with future directions including its application to unsupervised learning, real-world datasets, and large-scale implementations.
Problem

Research questions and friction points this paper is trying to address.

Error Correction
Training Dataset
Machine Learning Accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantum Computing
Black-box Optimization
Error Removal in Datasets
🔎 Similar Papers
No similar papers found.
M
Makoto Otsuka
LiLz Inc., Okinawa, Japan; Graduate School of Information Sciences, Tohoku University, Miyagi, Japan
K
Keisuke Morita
Graduate School of Information Sciences, Tohoku University, Miyagi, Japan; Sigma-i Co., Ltd., Tokyo, Japan
Masayuki Ohzeki
Masayuki Ohzeki
Graduate School of Information Sciences, Tohoku University
Statistical MechanicsMachine LearningSpin GlassPhase transitionQuantum Information