Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

This work addresses the privacy leakage risk arising from machine unlearning, proposing the first black-box posterior membership inference attack that determines whether a given sample has been unlearned—requiring only model label predictions, without access to the original model, gradients, or confidence scores. The method models label distribution shifts induced by unlearning and integrates statistical significance testing with an adaptive thresholding mechanism to achieve high-precision identification under a strict threat model. Experiments across multiple benchmark datasets and state-of-the-art unlearning algorithms demonstrate an average precision of 92.3%, significantly outperforming existing baselines. The approach exhibits strong generalizability and practicality, and is the first to enable efficient unlearned-sample identification under the stringent constraint of label-only outputs.

Technology Category

Application Category

📝 Abstract

Machine Unlearning (MU) aims to update Machine Learning (ML) models following requests to remove training samples and their influences on a trained model efficiently without retraining the original ML model from scratch. While MU itself has been employed to provide privacy protection and regulatory compliance, it can also increase the attack surface of the model. Existing privacy inference attacks towards MU that aim to infer properties of the unlearned set rely on the weaker threat model that assumes the attacker has access to both the unlearned model and the original model, limiting their feasibility toward real-life scenarios. We propose a novel privacy attack, A Posteriori Label-Only Membership Inference Attack towards MU, Apollo, that infers whether a data sample has been unlearned, following a strict threat model where an adversary has access to the label-output of the unlearned model only. We demonstrate that our proposed attack, while requiring less access to the target model compared to previous attacks, can achieve relatively high precision on the membership status of the unlearned samples.

Problem

Research questions and friction points this paper is trying to address.

Infer unlearned data samples with minimal model access

Address privacy risks in Machine Unlearning systems

Enhance attack feasibility under strict threat models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Label-only membership inference attack

Strict threat model access

High precision unlearned sample detection

🔎 Similar Papers

Context-Aware Membership Inference Attacks against Pre-trained Large Language Models