Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD

📅 2026-02-25

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the vulnerability of intermediate representations (IRs) in the Embedding-as-an-Interface paradigm, where IRs can leak membership information of training samples, exposing different layers to heterogeneous membership inference attack (MIA) risks. To mitigate this, the authors propose LM-DP-SGD, a novel method that introduces the first layer-wise MIA risk-aware mechanism. By training shadow models on a public shadow dataset, the approach quantifies per-layer MIA risk via attack error rates and adaptively weights gradient updates in DP-SGD according to these risk levels. This enables differentiated privacy protection aligned with each layer’s vulnerability under a fixed privacy budget. Experimental results demonstrate that LM-DP-SGD significantly reduces peak MIA risk at the IR level while preserving model utility, outperforming existing privacy-utility trade-off strategies.

Technology Category

Application Category

📝 Abstract

In Embedding-as-an-Interface (EaaI) settings, pre-trained models are queried for Intermediate Representations (IRs). The distributional properties of IRs can leak training-set membership signals, enabling Membership Inference Attacks (MIAs) whose strength varies across layers. Although Differentially Private Stochastic Gradient Descent (DP-SGD) mitigates such leakage, existing implementations employ per-example gradient clipping and a uniform, layer-agnostic noise multiplier, ignoring heterogeneous layer-wise MIA vulnerability. This paper introduces Layer-wise MIA-risk-aware DP-SGD (LM-DP-SGD), which adaptively allocates privacy protection across layers in proportion to their MIA risk. Specifically, LM-DP-SGD trains a shadow model on a public shadow dataset, extracts per-layer IRs from its train/test splits, and fits layer-specific MIA adversaries, using their attack error rates as MIA-risk estimates. Leveraging the cross-dataset transferability of MIAs, these estimates are then used to reweight each layer's contribution to the globally clipped gradient during private training, providing layer-appropriate protection under a fixed noise magnitude. We further establish theoretical guarantees on both privacy and convergence of LM-DP-SGD. Extensive experiments show that, under the same privacy budget, LM-DP-SGD reduces the peak IR-level MIA risk while preserving utility, yielding a superior privacy-utility trade-off.

Problem

Research questions and friction points this paper is trying to address.

Membership Inference Attack

Intermediate Representations

Differential Privacy

Layer-wise Vulnerability

Privacy-Utility Trade-off

Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-wise DP-SGD

Membership Inference Attack

Intermediate Representations