Robust Privacy: Inference-Time Privacy through Certified Robustness

📅 2026-01-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a Robust Privacy (RP) framework that integrates certified robustness with privacy protection during model inference, addressing the risk that machine learning models may inadvertently leak sensitive attributes through their outputs. By enforcing prediction invariance within ℓ₂-norm neighborhoods of inputs, the framework ensures local stability and introduces an Attribute Privacy Enhancement (APE) mechanism to translate input-level invariance into provable attribute-level privacy guarantees. Experimental results on recommendation tasks demonstrate that RP substantially expands the compatibility interval for sensitive attributes. Under Gaussian noise with σ=0.1, the success rate of model inversion attacks drops from 73% to 4%, and even without any performance degradation, the attack success rate is reduced to 44%.

Technology Category

Application Category

📝 Abstract
Machine learning systems can produce personalized outputs that allow an adversary to infer sensitive input attributes at inference time. We introduce Robust Privacy (RP), an inference-time privacy notion inspired by certified robustness: if a model's prediction is provably invariant within a radius-$R$ neighborhood around an input $x$ (e.g., under the $\ell_2$ norm), then $x$ enjoys $R$-Robust Privacy, i.e., observing the prediction cannot distinguish $x$ from any input within distance $R$ of $x$. We further develop Attribute Privacy Enhancement (APE) to translate input-level invariance into an attribute-level privacy effect. In a controlled recommendation task where the decision depends primarily on a sensitive attribute, we show that RP expands the set of sensitive-attribute values compatible with a positive recommendation, expanding the inference interval accordingly. Finally, we empirically demonstrate that RP also mitigates model inversion attacks (MIAs) by masking fine-grained input-output dependence. Even at small noise levels ($\sigma=0.1$), RP reduces the attack success rate (ASR) from 73% to 4% with partial model performance degradation. RP can also partially mitigate MIAs (e.g., ASR drops to 44%) with no model performance degradation.
Problem

Research questions and friction points this paper is trying to address.

inference-time privacy
sensitive attribute inference
model inversion attacks
privacy leakage
adversarial inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Robust Privacy
Certified Robustness
Inference-Time Privacy
Model Inversion Attacks
Attribute Privacy Enhancement
🔎 Similar Papers
No similar papers found.
J
Jiankai Jin
Xiangzheng Zhang
Xiangzheng Zhang
360
AI safetyLarge language modelsInformation Retrieval
Z
Zhao Liu
D
Deyue Zhang
Q
Quanchen Zou
AI Security Lab