VTarbel: Targeted Label Attack with Minimal Knowledge on Detector-enhanced Vertical Federated Learning

📅 2025-07-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing targeted label-flipping attacks against enhanced vertical federated learning (VFL) systems rely on model outputs and ground-truth labels, making them vulnerable to detection by anomaly detectors. Method: We propose a two-stage black-box targeted attack framework requiring only local data and query access to the detector—without knowledge of true labels or model internals. Our approach innovatively employs Maximum Mean Discrepancy (MMD)-based sampling to select highly expressive local samples, constructs a local surrogate model and a detector emulator in the absence of labels or model outputs, and generates stealthy adversarial perturbations via gradient-guided optimization. Contribution/Results: Extensive experiments across four models, seven cross-modal datasets, and two detector types demonstrate that our method significantly outperforms four baselines, evades detection with high success rates, and maintains strong attack efficacy against three mainstream privacy-preserving defenses: differential privacy, secure aggregation, and gradient clipping.

Technology Category

Application Category

📝 Abstract
Vertical federated learning (VFL) enables multiple parties with disjoint features to collaboratively train models without sharing raw data. While privacy vulnerabilities of VFL are extensively-studied, its security threats-particularly targeted label attacks-remain underexplored. In such attacks, a passive party perturbs inputs at inference to force misclassification into adversary-chosen labels. Existing methods rely on unrealistic assumptions (e.g., accessing VFL-model's outputs) and ignore anomaly detectors deployed in real-world systems. To bridge this gap, we introduce VTarbel, a two-stage, minimal-knowledge attack framework explicitly designed to evade detector-enhanced VFL inference. During the preparation stage, the attacker selects a minimal set of high-expressiveness samples (via maximum mean discrepancy), submits them through VFL protocol to collect predicted labels, and uses these pseudo-labels to train estimated detector and surrogate model on local features. In attack stage, these models guide gradient-based perturbations of remaining samples, crafting adversarial instances that induce targeted misclassifications and evade detection. We implement VTarbel and evaluate it against four model architectures, seven multimodal datasets, and two anomaly detectors. Across all settings, VTarbel outperforms four state-of-the-art baselines, evades detection, and retains effective against three representative privacy-preserving defenses. These results reveal critical security blind spots in current VFL deployments and underscore urgent need for robust, attack-aware defenses.
Problem

Research questions and friction points this paper is trying to address.

Targeted label attacks in VFL lack exploration
Existing methods ignore real-world anomaly detectors
VTarbel evades detection in VFL systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage attack framework evading detector-enhanced VFL
Uses maximum mean discrepancy for sample selection
Gradient-based perturbations evade detection effectively
🔎 Similar Papers
No similar papers found.
Juntao Tan
Juntao Tan
Research Scientist, Salesforce
Machine LearningExplainable AIRecommendation SystemInformation Retrieval
Anran Li
Anran Li
Yale University
Trustworthy AImedical LLMsfederated learning
Q
Quanchao Liu
Department of Security Technology Research, China Mobile Research Institute, China
P
Peng Ran
Department of Security Technology Research, China Mobile Research Institute, China
L
Lan Zhang
University of Science and Technology of China, China