Unlearning Offline Stochastic Multi-Armed Bandits

๐Ÿ“… 2026-05-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

201K/year
๐Ÿค– AI Summary
This work introduces machine unlearning to the offline stochastic multi-armed bandit setting for the first time, addressing data deletion requests and mitigating privacy risks. Formalizing privacy constraints under both fixed-sample and distributional data models, the study measures utility through decision quality and proposes an unlearning algorithm that integrates an adaptive switching Gaussian mechanism with a rollback strategy. A hybrid process is further designed to elucidate the underlying principles of baseline methods. Theoretical analysis establishes fundamental performance lower bounds across different settings, while experiments demonstrate that the proposed approach achieves a superior privacyโ€“utility trade-off compared to existing baselines, delivering efficient and provably sound unlearning guarantees.
๐Ÿ“ Abstract
Machine unlearning aims to unlearn data points from a learned model, offering a principled way to process data-deletion requests and mitigate privacy risks without full retraining. Prior work has mainly studied unsupervised / supervised machine unlearning, leaving unlearning for sequential decision-making systems far less understood. We initiate the first study of a foundational sequential decision-making problem: offline stochastic multi-armed bandits (MAB). We formalize the privacy constraint for offline MAB and measure utility by the post-unlearning decision quality. We conduct a systematic study of both single- and multi-source unlearning scenarios under two data-generation models, the fixed-sample model and the distribution model. For these settings, our algorithmic design is built on two canonical base algorithms: Gaussian mechanism and rollback, and we propose adaptive algorithms that switch between them according to the data regime and privacy constraint. We further introduce a mixing procedure that elucidates the rationale behind these baselines. We provide performance guarantees across the above settings and establish lower bounds under both dataset models. Experiments validate the predicted tradeoffs and demonstrate the effectiveness of the proposed methods.
Problem

Research questions and friction points this paper is trying to address.

machine unlearning
offline stochastic multi-armed bandits
sequential decision-making
privacy constraint
data deletion
Innovation

Methods, ideas, or system contributions that make the work stand out.

machine unlearning
offline stochastic multi-armed bandits
privacy-preserving sequential decision-making
adaptive unlearning algorithm
Gaussian mechanism
๐Ÿ”Ž Similar Papers
2024-05-04International Conference on Machine LearningCitations: 12