π€ AI Summary
This work investigates the user complexity of pure private agnostic learning under differential privacy, addressing both item-level (one example per user) and user-level (multiple examples per user) privacy settings. Methodologically, it integrates VC-dimension analysis, constructive perturbation techniques, tailored aggregation mechanisms, and refined empirical risk minimization. The results establish, for the first time, near-optimal sample complexity for general concept classes under item-level privacy; significantly tighten the user-level upper boundβimproving upon Ghazi et al. (2023); and derive nearly tight user complexity bounds Ξ©(d)βO(d log* d) for threshold functions. The core contributions are (i) fundamental lower bounds on required user count for both privacy models, and (ii) matching, practically implementable algorithms achieving these bounds, thereby characterizing the theoretical limits of user-scale efficiency in pure private agnostic learning.
π Abstract
Machine Learning has made remarkable progress in a wide range of fields. In many scenarios, learning is performed on datasets involving sensitive information, in which privacy protection is essential for learning algorithms. In this work, we study pure private learning in the agnostic model -- a framework reflecting the learning process in practice. We examine the number of users required under item-level (where each user contributes one example) and user-level (where each user contributes multiple examples) privacy and derive several improved upper bounds. For item-level privacy, our algorithm achieves a near optimal bound for general concept classes. We extend this to the user-level setting, rendering a tighter upper bound than the one proved by Ghazi et al. (2023). Lastly, we consider the problem of learning thresholds under user-level privacy and present an algorithm with a nearly tight user complexity.