🤖 AI Summary
Existing unlearning methods for vision models rely solely on output-layer metrics, which often fail to capture whether sensitive information is truly erased from internal representations. This work proposes Mirage, a framework that systematically evaluates unlearning efficacy at the representation level in vertical federated learning. Mirage reveals a significant gap between output- and representation-level unlearning, uncovers a “unlearning trilemma,” and identifies an asymmetry between class-level and sample-level forgetting. The framework integrates four complementary techniques—Linear Probe Recovery (LPR), Centered Kernel Alignment (CKA), feature separability scoring, and layer-wise recovery analysis—to establish a representation-aware auditing pipeline. Experiments across seven datasets and seven baseline methods demonstrate pervasive inadequacies in current approaches: after class-level unlearning, LPR accuracy remains as high as 97%, while sample-level unlearning barely reaches random chance (~50%), underscoring the critical need for representation-aware unlearning evaluation standards.
📝 Abstract
Machine unlearning in Vertical Federated Learning (VFL) has attracted growing interest, yet existing methods certify forgetting solely using output-level metrics. We challenge these claims by introducing Mirage, a representation-level auditing framework comprising four complementary diagnostics: Linear Probe Recovery (LPR), Centered Kernel Alignment (CKA), Feature Separability Scoring, and Layer-Wise Recovery Analysis. Through experiments across seven datasets and seven baseline methods following recent VFL unlearning protocols, Mirage reveals three key findings: (i) Forgetting gap: methods that pass output-level certification still retain substantial class structure in their representations, with LPR exceeding the retrained baseline by up to 15.4 points; CKA shows these models remain structurally closer to the original than to the retrained reference, while separability scores indicate persistent geometric discrimination. (ii) Unlearning trilemma: no existing method simultaneously achieves high utility, output-level forgetting, and representation-level forgetting. (iii) Class-sample asymmetry: class-level forgetting leaves strong representational traces (LPR up to 97%), whereas sample-level forgetting is indistinguishable from chance (LPR approx. 50%); layer-wise analysis further shows residual class information persists across network depths. These findings call for representation-aware evaluation standards in federated unlearning research.