Machine Learning with Privacy for Protected Attributes

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing differential privacy (DP) frameworks typically protect all features uniformly, failing to address scenarios where only specific sensitive attributes—not the entire feature set—require privacy protection. Method: This paper introduces Feature Differential Privacy (FDP), a simulation-based DP framework enabling dynamic, fine-grained distinction between protected and unprotected features, thereby enhancing robustness against attribute inference attacks. We propose an enhanced DP-SGD algorithm integrated with subsampling amplification to provably satisfy FDP constraints. Results: Experiments on diffusion models trained on AFHQ demonstrate that under ε = 8, FDP achieves a Fréchet Inception Distance (FID) of 101.9—significantly outperforming full-feature DP baselines (FID = 286.7). This work establishes the first systematic DP paradigm for selective, feature-level privacy protection, effectively balancing strong theoretical privacy guarantees with high model utility.

Technology Category

Application Category

📝 Abstract

Differential privacy (DP) has become the standard for private data analysis. Certain machine learning applications only require privacy protection for specific protected attributes. Using naive variants of differential privacy in such use cases can result in unnecessary degradation of utility. In this work, we refine the definition of DP to create a more general and flexible framework that we call feature differential privacy (FDP). Our definition is simulation-based and allows for both addition/removal and replacement variants of privacy, and can handle arbitrary and adaptive separation of protected and non-protected features. We prove the properties of FDP, such as adaptive composition, and demonstrate its implications for limiting attribute inference attacks. We also propose a modification of the standard DP-SGD algorithm that satisfies FDP while leveraging desirable properties such as amplification via sub-sampling. We apply our framework to various machine learning tasks and show that it can significantly improve the utility of DP-trained models when public features are available. For example, we train diffusion models on the AFHQ dataset of animal faces and observe a drastic improvement in FID compared to DP, from 286.7 to 101.9 at $ε=8$, assuming that the blurred version of a training image is available as a public feature. Overall, our work provides a new approach to private data analysis that can help reduce the utility cost of DP while still providing strong privacy guarantees.

Problem

Research questions and friction points this paper is trying to address.

Protects specific attributes in ML with differential privacy

Reduces utility degradation in privacy-preserving ML models

Enhances model performance while maintaining strong privacy guarantees

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces feature differential privacy (FDP) framework

Modifies DP-SGD algorithm to satisfy FDP

Improves utility of DP-trained models significantly

🔎 Similar Papers

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions