PASS: Private Attributes Protection with Stochastic Data Substitution

📅 2025-06-08

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Machine learning services often inadvertently leak users’ private attributes (e.g., gender, race) during data collection, posing serious privacy risks. Existing adversarial training–based approaches for private attribute protection suffer from inherent fragility and struggle to simultaneously ensure privacy preservation and downstream task utility. To address this, we propose an information-theoretic, differentiable random sample replacement paradigm that explicitly models mutual information between private attributes and learned features. Our method introduces a stochastic, differentiable replacement mechanism coupled with a customized loss function to achieve strict statistical decoupling of private attributes from representations. Crucially, it avoids the instability and optimization challenges inherent in adversarial training and generalizes seamlessly across multimodal domains—including images, sensor signals, and speech. Experiments demonstrate that our approach reduces private attribute prediction accuracy by over 60%, while incurring less than 2% degradation in downstream task performance—substantially outperforming state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

The growing Machine Learning (ML) services require extensive collections of user data, which may inadvertently include people's private information irrelevant to the services. Various studies have been proposed to protect private attributes by removing them from the data while maintaining the utilities of the data for downstream tasks. Nevertheless, as we theoretically and empirically show in the paper, these methods reveal severe vulnerability because of a common weakness rooted in their adversarial training based strategies. To overcome this limitation, we propose a novel approach, PASS, designed to stochastically substitute the original sample with another one according to certain probabilities, which is trained with a novel loss function soundly derived from information-theoretic objective defined for utility-preserving private attributes protection. The comprehensive evaluation of PASS on various datasets of different modalities, including facial images, human activity sensory signals, and voice recording datasets, substantiates PASS's effectiveness and generalizability.

Problem

Research questions and friction points this paper is trying to address.

Protecting private attributes in ML data while preserving utility

Addressing vulnerability in adversarial training-based privacy methods

Proposing stochastic substitution for robust privacy protection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic data substitution for privacy protection

Novel loss function from information-theoretic objectives

Effective across diverse datasets and modalities

🔎 Similar Papers

IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization