A Unified Perspective on Adversarial Membership Manipulation in Vision Models

📅 2026-04-03

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work addresses a critical gap in existing membership inference attacks, which typically assume unperturbed query inputs and overlook the privacy risks posed by adversarial examples. We systematically uncover, for the first time, the phenomenon of “adversarial membership spoofing” in vision models: imperceptible perturbations can cause non-member samples to be falsely classified as members of the training set, accompanied by a distinctive collapse trajectory in gradient norms. Building on this insight, we propose the first unified framework for robust detection and defense, integrating geometric analysis of gradient features with a robust inference mechanism. Extensive experiments demonstrate that our approach effectively identifies and mitigates such attacks across diverse models and datasets, substantially enhancing the reliability of membership inference and strengthening privacy guarantees.

Technology Category

Application Category

📝 Abstract

Membership inference attacks (MIAs) aim to determine whether a specific data point was part of a model's training set, serving as effective tools for evaluating privacy leakage of vision models. However, existing MIAs implicitly assume honest query inputs, and their adversarial robustness remains unexplored. We show that MIAs for vision models expose a previously overlooked adversarial surface: adversarial membership manipulation, where imperceptible perturbations can reliably push non-member images into the "member" region of state-of-the-art MIAs. In this paper, we provide the first unified perspective on this phenomenon by analyzing its mechanism and implications. We begin by demonstrating that adversarial membership fabrication is consistently effective across diverse architectures and datasets. We then reveal a distinctive geometric signature - a characteristic gradient-norm collapse trajectory - that reliably separates fabricated from true members despite their nearly identical semantic representations. Building on this insight, we introduce a principled detection strategy grounded in gradient-geometry signals and develop a robust inference framework that substantially mitigates adversarial manipulation. Extensive experiments show that fabrication is broadly effective, while our detection and robust inference strategies significantly enhance resilience. This work establishes the first comprehensive framework for adversarial membership manipulation in vision models.

Problem

Research questions and friction points this paper is trying to address.

membership inference attacks

adversarial robustness

privacy leakage

vision models

adversarial manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

adversarial membership manipulation

membership inference attacks

gradient-geometry signature