What Really is a Member? Discrediting Membership Inference via Poisoning

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This paper reveals that existing language model membership inference attacks remain highly vulnerable to data poisoning—even under semantic neighborhood definitions—where “members” are extended to semantically similar samples. Attackers can significantly degrade inference accuracy using carefully crafted poisoned inputs. Method: The authors theoretically establish a fundamental trade-off between membership inference accuracy and poisoning robustness, and propose the first provably effective semantic-aware poisoning method. It jointly optimizes gradient manipulation and semantic similarity constraints, preserving model utility while degrading inference performance. Contribution/Results: Empirical evaluation demonstrates that state-of-the-art membership inference methods fall below the 50% random baseline under the proposed attack. This exposes a structural vulnerability in real-world deployments of membership inference, providing critical theoretical insights for trustworthy AI assessment and establishing a practical, semantically grounded benchmark for adversarial evaluation.

Technology Category

Application Category

📝 Abstract

Membership inference tests aim to determine whether a particular data point was included in a language model's training set. However, recent works have shown that such tests often fail under the strict definition of membership based on exact matching, and have suggested relaxing this definition to include semantic neighbors as members as well. In this work, we show that membership inference tests are still unreliable under this relaxation - it is possible to poison the training dataset in a way that causes the test to produce incorrect predictions for a target point. We theoretically reveal a trade-off between a test's accuracy and its robustness to poisoning. We also present a concrete instantiation of this poisoning attack and empirically validate its effectiveness. Our results show that it can degrade the performance of existing tests to well below random.

Problem

Research questions and friction points this paper is trying to address.

Membership inference tests often fail under strict or relaxed definitions

Training dataset poisoning can cause incorrect membership predictions

Trade-off exists between test accuracy and robustness to poisoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Poisoning training dataset to mislead tests

Theoretical trade-off between accuracy and robustness

Empirical validation of poisoning attack effectiveness

🔎 Similar Papers

No similar papers found.

Bosch Group

Elchingen, BY, DE

Member of Technical Staff - Post Training - MAI Superintelligence Team

Microsoft

$119,800 -

San Francisco Bay area / New York City metropolitan area

Authors to Follow