Membership Inference Attacks for Unseen Classes

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This paper addresses a more realistic distribution shift scenario in membership inference attacks—where the auditor/adversary has access to only a subset of classes (e.g., 90% of classes missing)—causing conventional shadow-model-based attacks, which rely on complete background class distributions, to suffer severe performance degradation. We propose the first systematic theoretical framework modeling distribution shift under class dropout. Innovatively, we design a quantile regression–based attack that does not require alignment with the target model’s training distribution, and we theoretically characterize its feasibility and performance bounds. On unseen classes of CIFAR-100, our method achieves a true positive rate (TPR) 11× higher than standard shadow-model attacks; on ImageNet with 90% of classes removed, it retains a non-zero TPR—demonstrating significantly improved robustness and generalization under extreme distribution shift.

Technology Category

Application Category

📝 Abstract

Shadow model attacks are the state-of-the-art approach for membership inference attacks on machine learning models. However, these attacks typically assume an adversary has access to a background (nonmember) data distribution that matches the distribution the target model was trained on. We initiate a study of membership inference attacks where the adversary or auditor cannot access an entire subclass from the distribution -- a more extreme but realistic version of distribution shift than has been studied previously. In this setting, we first show that the performance of shadow model attacks degrades catastrophically, and then demonstrate the promise of another approach, quantile regression, that does not have the same limitations. We show that quantile regression attacks consistently outperform shadow model attacks in the class dropout setting -- for example, quantile regression attacks achieve up to 11$ imes$ the TPR of shadow models on the unseen class on CIFAR-100, and achieve nontrivial TPR on ImageNet even with 90% of training classes removed. We also provide a theoretical model that illustrates the potential and limitations of this approach.

Problem

Research questions and friction points this paper is trying to address.

Study membership inference attacks with missing subclass data

Compare shadow model and quantile regression attack performance

Theoretical analysis of quantile regression attack potential and limits

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantile regression attacks outperform shadow models

Handles unseen classes with distribution shift

Achieves higher TPR on CIFAR-100 and ImageNet

🔎 Similar Papers

Blind Baselines Beat Membership Inference Attacks for Foundation Models