🤖 AI Summary
This paper investigates semi-feature privacy under local differential privacy (LDP), where a subset of features is publicly released while the remaining features and labels must satisfy LDP constraints. We formally introduce the “semi-feature LDP” model—the first principled framework deviating from conventional full-feature perturbation. For nonparametric regression, we propose HistOfTree, an estimator that integrates histogram-based partitioning with adaptive tree-structured feature splitting, augmented by a data-driven hyperparameter selection strategy. We establish its minimax-optimal convergence rate, strictly improving upon existing LDP lower bounds for analogous problems. Extensive experiments on synthetic and real-world datasets demonstrate consistent and significant performance gains over state-of-the-art methods. Our core contributions unify conceptual modeling innovation, algorithmic design, and theoretical advancement—establishing both a new privacy paradigm and provably optimal estimation under semi-feature LDP.
📝 Abstract
We initiate the study of locally differentially private (LDP) learning with public features. We define semi-feature LDP, where some features are publicly available while the remaining ones, along with the label, require protection under local differential privacy. Under semi-feature LDP, we demonstrate that the mini-max convergence rate for non-parametric regression is significantly reduced compared to that of classical LDP. Then we propose HistOfTree, an estimator that fully leverages the information contained in both public and private features. Theoretically, HistOfTree reaches the mini-max optimal convergence rate. Empirically, HistOfTree achieves superior performance on both synthetic and real data. We also explore scenarios where users have the flexibility to select features for protection manually. In such cases, we propose an estimator and a data-driven parameter tuning strategy, leading to analogous theoretical and empirical results.