🤖 AI Summary
Existing sample-level model unlearning evaluation methods incur high computational overhead and lack fine-grained quantification of approximate unlearning.
Method: This paper proposes the Interpolation Approximation Measure (IAM) framework—the first to jointly quantify exact and approximate unlearning. IAM constructs a lightweight interpolation model grounded in the gap between model generalization and fitting behaviors, requiring only a single shadow model for efficient evaluation. It provides theoretical guarantees on scalability, especially for large language models. We further design a sample-level unlearning score and a theory-driven scoring mechanism to enable real-time, interpretable integrity assessment.
Results: Experiments demonstrate that IAM systematically uncovers over-unlearning and under-unlearning risks across diverse approximate unlearning algorithms; its binary detection accuracy matches online attack baselines, and unlearning correlation improves significantly. IAM establishes the first quantifiable, deployable paradigm for sample-level unlearning integrity evaluation—directly supporting privacy compliance.
📝 Abstract
Growing concerns over data privacy and security highlight the importance of machine unlearning--removing specific data influences from trained models without full retraining. Techniques like Membership Inference Attacks (MIAs) are widely used to externally assess successful unlearning. However, existing methods face two key limitations: (1) maximizing MIA effectiveness (e.g., via online attacks) requires prohibitive computational resources, often exceeding retraining costs; (2) MIAs, designed for binary inclusion tests, struggle to capture granular changes in approximate unlearning. To address these challenges, we propose the Interpolated Approximate Measurement (IAM), a framework natively designed for unlearning inference. IAM quantifies sample-level unlearning completeness by interpolating the model's generalization-fitting behavior gap on queried samples. IAM achieves strong performance in binary inclusion tests for exact unlearning and high correlation for approximate unlearning--scalable to LLMs using just one pre-trained shadow model. We theoretically analyze how IAM's scoring mechanism maintains performance efficiently. We then apply IAM to recent approximate unlearning algorithms, revealing general risks of both over-unlearning and under-unlearning, underscoring the need for stronger safeguards in approximate unlearning systems. The code is available at https://github.com/Happy2Git/Unlearning_Inference_IAM.