An Information Theoretic Evaluation Metric For Strong Unlearning

📅 2024-05-28

📈 Citations: 5

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Evaluating strong unlearning in deep neural networks—i.e., verifying whether a model after unlearning is statistically indistinguishable from one never exposed to the to-be-forgotten data—remains challenging. Existing black-box methods, relying solely on output-layer behavior, lack sensitivity to residual information persisting in intermediate layers. Method: We propose the first white-box evaluation metric, the Information Difference Index (IDI), grounded in mutual information theory. IDI quantifies the statistical dependence between intermediate-layer representations and the target forget-label, enabling fine-grained detection of lingering memorization. Contribution/Results: Extensive experiments across diverse architectures (ResNet, ViT) and datasets (CIFAR-10/100, ImageNet subsets) demonstrate that IDI achieves superior sensitivity and cross-model consistency compared to black-box baselines—including membership inference attacks and accuracy-difference metrics. IDI provides an interpretable, reproducible, and theoretically principled benchmark for compliance verification under the “right to be forgotten.”

Technology Category

Application Category

📝 Abstract

Machine unlearning (MU) aims to remove the influence of specific data from trained models, addressing privacy concerns and ensuring compliance with regulations such as the ``right to be forgotten.''Evaluating strong unlearning, where the unlearned model is indistinguishable from one retrained without the forgetting data, remains a significant challenge in deep neural networks (DNNs). Common black-box metrics, such as variants of membership inference attacks and accuracy comparisons, primarily assess model outputs but often fail to capture residual information in intermediate layers. To bridge this gap, we introduce the Information Difference Index (IDI), a novel white-box metric inspired by information theory. IDI quantifies retained information in intermediate features by measuring mutual information between those features and the labels to be forgotten, offering a more comprehensive assessment of unlearning efficacy. Our experiments demonstrate that IDI effectively measures the degree of unlearning across various datasets and architectures, providing a reliable tool for evaluating strong unlearning in DNNs.

Problem

Research questions and friction points this paper is trying to address.

Evaluating strong unlearning remains challenging in deep neural networks

Existing black-box metrics fail to capture residual information in layers

Proposing a white-box metric to quantify retained forgotten information

Innovation

Methods, ideas, or system contributions that make the work stand out.

IDI measures mutual information in intermediate layers

IDI quantifies retained information for forgotten labels

IDI offers white-box evaluation using information theory

🔎 Similar Papers

Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models