🤖 AI Summary
In vertical federated learning (VFL), internal passive label inference attacks can reconstruct private labels by exploiting gradients and semantic embeddings—even with only a few auxiliary labels—causing large-scale privacy leakage. Existing defenses typically target isolated leakage channels and thus fail against multi-source, composite attacks. To address this, we propose LADSG, a unified defense framework that jointly mitigates gradient-, embedding-, and label-level leakage without encryption or strong assumptions. LADSG integrates three novel components: (i) gradient semantic similarity substitution via similar-subspace projection, (ii) label distillation-based anonymization, and (iii) lightweight anomaly detection with privacy-utility trade-off optimization. Evaluated on six real-world datasets, LADSG reduces attack success rates by 30–60% while increasing computational overhead by less than 3%, significantly enhancing both robustness and practicality of VFL systems.
📝 Abstract
Vertical federated learning (VFL) has become a key paradigm for collaborative machine learning, enabling multiple parties to train models over distributed feature spaces while preserving data privacy. Despite security protocols that defend against external attacks - such as gradient masking and encryption, which prevent unauthorized access to sensitive data - recent label inference attacks from within the system have emerged. These attacks exploit gradients and semantic embeddings to reconstruct private labels, bypassing traditional defenses. For example, the passive label inference attack can reconstruct tens of thousands of participants' private data using just 40 auxiliary labels, posing a significant security threat. Existing defenses address single leakage pathways, such as gradient leakage or label exposure. As attack strategies evolve, their limitations become clear, especially against hybrid attacks that combine multiple vectors. To address this, we propose Label-Anonymized Defense with Substitution Gradient (LADSG), a unified defense framework that integrates gradient substitution, label anonymization, and anomaly detection. LADSG mitigates both gradient and label leakage while maintaining the scalability and efficiency of VFL. Experiments on six real-world datasets show that LADSG reduces label inference attack success rates by 30-60%, with minimal computational overhead, underscoring the importance of lightweight defenses in securing VFL.