Unlearning Offline Stochastic Multi-Armed Bandits

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work introduces machine unlearning to the offline stochastic multi-armed bandit setting for the first time, addressing data deletion requests and mitigating privacy risks. Formalizing privacy constraints under both fixed-sample and distributional data models, the study measures utility through decision quality and proposes an unlearning algorithm that integrates an adaptive switching Gaussian mechanism with a rollback strategy. A hybrid process is further designed to elucidate the underlying principles of baseline methods. Theoretical analysis establishes fundamental performance lower bounds across different settings, while experiments demonstrate that the proposed approach achieves a superior privacy–utility trade-off compared to existing baselines, delivering efficient and provably sound unlearning guarantees.

📝 Abstract

Machine unlearning aims to unlearn data points from a learned model, offering a principled way to process data-deletion requests and mitigate privacy risks without full retraining. Prior work has mainly studied unsupervised / supervised machine unlearning, leaving unlearning for sequential decision-making systems far less understood. We initiate the first study of a foundational sequential decision-making problem: offline stochastic multi-armed bandits (MAB). We formalize the privacy constraint for offline MAB and measure utility by the post-unlearning decision quality. We conduct a systematic study of both single- and multi-source unlearning scenarios under two data-generation models, the fixed-sample model and the distribution model. For these settings, our algorithmic design is built on two canonical base algorithms: Gaussian mechanism and rollback, and we propose adaptive algorithms that switch between them according to the data regime and privacy constraint. We further introduce a mixing procedure that elucidates the rationale behind these baselines. We provide performance guarantees across the above settings and establish lower bounds under both dataset models. Experiments validate the predicted tradeoffs and demonstrate the effectiveness of the proposed methods.

Problem

Research questions and friction points this paper is trying to address.

machine unlearning

offline stochastic multi-armed bandits

sequential decision-making

privacy constraint

data deletion

Innovation

Methods, ideas, or system contributions that make the work stand out.

machine unlearning

offline stochastic multi-armed bandits

privacy-preserving sequential decision-making