Black-box optimization and quantum annealing for filtering out mislabeled training instances

📅 2025-01-12

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses training data purification under label noise. We propose a novel proxy-model-driven collaborative framework—Black-Box Optimization–Post-processing–Quantum Annealing (BBO-Post-QA)—that integrates Gaussian process surrogate modeling, iterative black-box optimization, and quantum annealing for subset selection. To our knowledge, this is the first application of a physical quantum annealer (D-Wave) to training set purification: the surrogate model estimates validation error of candidate subsets; black-box optimization guides search; and the D-Wave clique sampler enables efficient, diverse sampling of high-quality clean subsets. A robust post-processing step further refines selections. Experiments on high-noise binary classification tasks demonstrate substantial improvements in downstream model generalization. Compared to classical simulated annealing (OpenJij/Neal), the D-Wave hardware implementation achieves faster convergence and superior subset quality, validating the efficacy and frontier potential of quantum-inspired optimization for data cleaning.

Technology Category

Application Category

📝 Abstract

This study proposes an approach for removing mislabeled instances from contaminated training datasets by combining surrogate model-based black-box optimization (BBO) with postprocessing and quantum annealing. Mislabeled training instances, a common issue in real-world datasets, often degrade model generalization, necessitating robust and efficient noise-removal strategies. The proposed method evaluates filtered training subsets based on validation loss, iteratively refines loss estimates through surrogate model-based BBO with postprocessing, and leverages quantum annealing to efficiently sample diverse training subsets with low validation error. Experiments on a noisy majority bit task demonstrate the method's ability to prioritize the removal of high-risk mislabeled instances. Integrating D-Wave's clique sampler running on a physical quantum annealer achieves faster optimization and higher-quality training subsets compared to OpenJij's simulated quantum annealing sampler or Neal's simulated annealing sampler, offering a scalable framework for enhancing dataset quality. This work highlights the effectiveness of the proposed method for supervised learning tasks, with future directions including its application to unsupervised learning, real-world datasets, and large-scale implementations.

Problem

Research questions and friction points this paper is trying to address.

Error Correction

Training Dataset

Machine Learning Accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantum Computing

Black-box Optimization

Error Removal in Datasets

🔎 Similar Papers

No similar papers found.