๐ค AI Summary
This work studies sequential decision-making in nonparametric contextual multi-armed bandits (MAB) under local differential privacy (LDP) constraints, aiming to balance privacy preservation and learning utility in sensitive-data settings. To mitigate the statistical efficiency loss induced by LDP, we propose a unified confidence-bound-type estimator coupled with a jump-start auxiliary data mechanism. We establish, for the first time, a minimax-optimal theoretical framework for LDP nonparametric MAB with auxiliary data and derive a matching information-theoretic lower bound. Our analysis proves that the proposed algorithm achieves the optimal convergence rate under LDP. Empirical evaluations on both synthetic and real-world datasets demonstrate that our method significantly outperforms existing baselines in the privacyโutility trade-off.
๐ Abstract
Motivated by privacy concerns in sequential decision-making on sensitive data, we address the challenge of nonparametric contextual multi-armed bandits (MAB) under local differential privacy (LDP). We develop a uniform-confidence-bound-type estimator, showing its minimax optimality supported by a matching minimax lower bound. We further consider the case where auxiliary datasets are available, subject also to (possibly heterogeneous) LDP constraints. Under the widely-used covariate shift framework, we propose a jump-start scheme to effectively utilize the auxiliary data, the minimax optimality of which is further established by a matching lower bound. Comprehensive experiments on both synthetic and real-world datasets validate our theoretical results and underscore the effectiveness of the proposed methods.