🤖 AI Summary
This paper addresses a more realistic distribution shift scenario in membership inference attacks—where the auditor/adversary has access to only a subset of classes (e.g., 90% of classes missing)—causing conventional shadow-model-based attacks, which rely on complete background class distributions, to suffer severe performance degradation. We propose the first systematic theoretical framework modeling distribution shift under class dropout. Innovatively, we design a quantile regression–based attack that does not require alignment with the target model’s training distribution, and we theoretically characterize its feasibility and performance bounds. On unseen classes of CIFAR-100, our method achieves a true positive rate (TPR) 11× higher than standard shadow-model attacks; on ImageNet with 90% of classes removed, it retains a non-zero TPR—demonstrating significantly improved robustness and generalization under extreme distribution shift.
📝 Abstract
Shadow model attacks are the state-of-the-art approach for membership inference attacks on machine learning models. However, these attacks typically assume an adversary has access to a background (nonmember) data distribution that matches the distribution the target model was trained on. We initiate a study of membership inference attacks where the adversary or auditor cannot access an entire subclass from the distribution -- a more extreme but realistic version of distribution shift than has been studied previously. In this setting, we first show that the performance of shadow model attacks degrades catastrophically, and then demonstrate the promise of another approach, quantile regression, that does not have the same limitations. We show that quantile regression attacks consistently outperform shadow model attacks in the class dropout setting -- for example, quantile regression attacks achieve up to 11$ imes$ the TPR of shadow models on the unseen class on CIFAR-100, and achieve nontrivial TPR on ImageNet even with 90% of training classes removed. We also provide a theoretical model that illustrates the potential and limitations of this approach.