Unveiling Client Privacy Leakage from Public Dataset Usage in Federated Distillation

📅 2025-02-11

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work identifies a novel privacy threat in Public Dataset-Assisted Federated Distillation (PDA-FD): under the honest-but-curious server assumption, the server can infer clients’ private training data label distributions and membership status by analyzing their inference outputs on public data. We present the first systematic privacy analysis of PDA-FD, proposing two targeted attacks: (1) a KL-divergence-based label distribution inference attack, and (2) an enhanced Likelihood Ratio Attack (LiRA) for membership inference. Extensive experiments across three state-of-the-art PDA-FD frameworks—FedMD, DS-FL, and Cronus—demonstrate SOTA performance: the label distribution attack achieves minimal KL divergence between inferred and true distributions, while the membership inference attack attains high true positive rates at low false positive rates. Our findings establish both theoretical foundations and practical benchmarks for privacy risk assessment in PDA-FD, highlighting critical vulnerabilities previously overlooked in this paradigm.

Technology Category

Application Category

📝 Abstract

Federated Distillation (FD) has emerged as a popular federated training framework, enabling clients to collaboratively train models without sharing private data. Public Dataset-Assisted Federated Distillation (PDA-FD), which leverages public datasets for knowledge sharing, has become widely adopted. Although PDA-FD enhances privacy compared to traditional Federated Learning, we demonstrate that the use of public datasets still poses significant privacy risks to clients' private training data. This paper presents the first comprehensive privacy analysis of PDA-FD in presence of an honest-but-curious server. We show that the server can exploit clients' inference results on public datasets to extract two critical types of private information: label distributions and membership information of the private training dataset. To quantify these vulnerabilities, we introduce two novel attacks specifically designed for the PDA-FD setting: a label distribution inference attack and innovative membership inference methods based on Likelihood Ratio Attack (LiRA). Through extensive evaluation of three representative PDA-FD frameworks (FedMD, DS-FL, and Cronus), our attacks achieve state-of-the-art performance, with label distribution attacks reaching minimal KL-divergence and membership inference attacks maintaining high True Positive Rates under low False Positive Rate constraints. Our findings reveal significant privacy risks in current PDA-FD frameworks and emphasize the need for more robust privacy protection mechanisms in collaborative learning systems.

Problem

Research questions and friction points this paper is trying to address.

Privacy risks in Federated Distillation

Exploitation of public datasets

Label and membership information leakage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Public Dataset-Assisted Federated Distillation

Label Distribution Inference Attack

Likelihood Ratio Attack for Membership Inference

🔎 Similar Papers

FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering